巴西专利BR112019010515A2 systems and methods for signaling and restricting a dynamic high range (hdr) video system with dynam

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
computer readable methods, apparatus and means for processing video data are provided by the video coding system implementing st 2094-10 video data. may include at least two video signals, which may be displayed at the same time in different display regions of a video frame. In various implementations, various techniques can be used to determine an association between a set of color volume parameters and a video signal, and this association can be encoded in a bit stream. By decoding the bitstream, the set of color volume parameters associated with a particular video signal can be used to compress the color volume of the video signal into a range that can be displayed by a video display device. private.
公开号:BR112019010515A2
申请号:R112019010515
申请日:2017-11-30
公开日:2019-10-01
发明作者:Krishnan Ramasubramonian Adarsh；Rusanovskyy Dmytro；Sole Rojals Joel
申请人:Qualcomm Inc；
IPC主号:

专利说明:

SYSTEMS AND METHODS FOR SIGNALING AND RESTRICTING A HIGH RANGE DYNAMIC (HDR) VIDEO SYSTEM WITH DYNAMIC METADATA
FIELD [001] This request is related to video systems and methods. More specifically, this application refers to systems and methods for organizing and managing a High Dynamic Range (HDR) video system with dynamic metadata (for example, ST 2094-10). These systems and methods are applicable for the diffusion of digital video or Via Top video systems that support the signaling of UHD and HDR / WCG video signals, or any other suitable video system.
FUNDAMENTALS [002] Examples of video encoding standards include ITU-T H.261, ISO / IEC MPEG-1 Visual, ITU-T H.262 or ISO / IEC MPEG-2 Visual, ITU-T H.263, ISO / IEC MPEG 4 Visual and ITU-T H.264 (also known as ISO / IEC MPEG -4 AVC), including its scalable (SVC) and Multi-view (MVC) video encoding extensions.
[003] Addition, a video coding standard, namely, High Efficiency video coding (HEVC), was developed by the Joint Video Coding Collaboration Team (JCT-VC) of the ITU Video Coding Specialists Group -T (VCEG) and a Group of ISO / IEC Motion Image Specialists (MPEG). The latest HEVC draw specification is available as an ITU-T H.265 recommendation: High Efficiency video encoding (HEVC), http: / www. Int / rec / T-REC265-201504-1 / en.
Petition 870190048209, of 05/23/2019, p. 19/202
2/156 [004] Following the decoding process, ο uncompressed video signal is signaled via a high-speed digital physical interface to a final consumer device, such as a display or a TV. Protocols, requirements and recommendations for the use of uncompressed digital interfaces by consumer electronic devices such as Digital Televisions (DTVs), digital cables, satellite or terrestrial set-top boxes (STBs) and related peripheral devices that include, but are not limited to, limit to, DVD players / recorders and other related Sources or Sinks are specified in the CTA -861 specification. A recent version of the specification is available at https: // standards. eta .tech / kwspub / published_docs / ANSI-CTA8 61-F-Preview. pdf.
BRIEF SUMMARY [005] In various implementations, systems and methods are provided for encoding color volume transform parameters defined by ST 2094-10 in a bit stream. A video can be captured with a large volume of color, including a wide dynamic range of colors and a wide range of colors. The video captured in this way tries to capture the range and depth of colors that can be perceived by human vision. However, a display device may not be able to display a large volume of color. Thus, standards such as ST 2094-10 define parameters for conducting color volume transforms, which can be used to compress a color volume into a more compact form.
[006] Techniques are provided that allow
Petition 870190048209, of 05/23/2019, p. 20/202
3/156 a video encoding system uses the parameters defined by ST 2094-10. According to at least one example, a method of processing video data is provided that includes receiving the video data, wherein the video data includes at least two video signals. The method also includes obtaining one or more sets of color volume transformation parameters from the video data. The method further includes determining a display region for each of the at least two video signals, wherein the display region determines a portion of a video frame in which at least two video signals will be displayed. The method further includes determining, for each of the at least two video signals, an association between a video signal between the at least two video signals and a set of color volume transformation parameters between one or more sets of color parameters. color volume transformation, where the set of color volume transformation parameters determine one or more display parameters for a display region for the video signal. The method also includes the generation of one or more blocks of metadata for one or more sets of color volume transformation parameters. The method further includes generating an encoded bit stream for video data, wherein the encoded bit stream includes one or more blocks of metadata. The method further includes encoding, in the encoded bit stream, the association determined between the at least two video signals and the one or more sets of color volume parameters.
[007] In another example, a
Petition 870190048209, of 05/23/2019, p. 21/202
4/156 apparatus including a memory configured to store video data including at least two video signals and a processor. The processor is configured for and can obtain one or more sets of color volume transformation parameters from the video data. The processor is configured for and can determine a display region for each of the at least two video signals, where the display region determines a portion of a video frame in which at least two video signals will be displayed. The processor is configured for and can determine, for each of the at least two video signals, a respective association between a video signal between the at least two video signals and a set of color volume transformation parameters between one or more sets of color volume transformation parameters, where the set of color volume transformation parameters determine one or more display parameters for a display region for the video signal. The processor is configured for and can generate one or more blocks of metadata for one or more sets of color volume transform parameters. The processor is configured to and can generate an encoded bit stream for the video data, wherein the encoded bit stream includes one or more blocks of metadata. The processor is configured to and can encode, in the encoded bit stream, the respective associations determined between the at least two video signals and the one or more sets of color volume parameters.
[008] In another example, a computer-readable medium is provided with instructions stored there
Petition 870190048209, of 05/23/2019, p. 22/202
5/156 operating in accordance with the present invention performed by a processor which performs a method which includes: receiving the video data, wherein the video data includes at least two video signals. The method further includes obtaining one or more sets of color volume transformation parameters from the video data. The method further includes determining a display region for each of the at least two video signals, wherein the display region determines a portion of a video frame in which at least two video signals will be displayed. The method further includes determining, for each of the at least two video signals, a respective association between a video signal between the at least two video signals and a set of color volume transformation parameters between one or more sets of color volume transformation parameters, where the set of color volume transformation parameters determine one or more display parameters for a display region for the video signal. The method further includes the generation of one or more blocks of metadata for one or more sets of color volume transformation parameters. The method further includes generating an encoded bit stream for the video data, wherein the encoded bit stream includes one or more blocks of metadata. The method further includes encoding, in the encoded bit stream, the respective associations determined between the at least two video signals and the one or more sets of color volume parameters.
[009] In another example, a device is provided that includes means for receiving video data,
Petition 870190048209, of 05/23/2019, p. 23/202
6/156 wherein the video data includes at least two video signals. The apparatus further comprises means for obtaining one or more sets of color volume transformation parameters from the video data. The apparatus further comprises means for determining a display region for each of the at least two video signals, wherein the display region determines a portion of a video frame in which at least two video signals will be displayed. The apparatus further comprises means for determining, for each of the at least two video signals, a respective association between a video signal between the at least two video signals and a set of color volume transformation parameters between one or more sets of color volume transformation parameters, where the set of color volume transformation parameters determine one or more display parameters for a display region for the video signal. The apparatus further comprises means for generating one or more blocks of metadata for one or more sets of color volume transformation parameters. The apparatus further comprises means for generating an encoded bit stream for the video data, wherein the encoded bit stream includes one or more metadata blocks. The apparatus further comprises means for encoding, in the encoded bit stream, the respective associations determined between the at least two video signals and the one or more sets of color volume parameters.
[0010] In some respects, the coding of the respective associations determined between at least two video signals and one or more sets of parameters
Petition 870190048209, of 05/23/2019, p. 24/202
7/156 color volume includes placing one or more blocks of metadata in the encoded bit stream according to an order of the display regions within the video frame.
[0011] In some respects, the encoding of the respective associations determined between at least two video signals and the one or more sets of color volume parameters includes the insertion of one or more values in the encoded bit stream that each indicates the respective determined associations.
[0012] In some respects, a first display region for a first video signal between at least two video signals overlaps a second display region for a second video signal between at least two video signals, and wherein a set of color volume transformation parameters between one or more sets of color volume transformation parameters for use in the overlap region is determined by a priority between the first display region and the second display region. In some respects, the priority is based on an order in which the first display region and the second display region are displayed on the video frame. In some ways, the priority is based on a value provided by the video data.
[0013] In some respects, one or more blocks of metadata are encoded in one or more Network Abstraction Layer Units (NAL) of supplementary improvement information (SEI).
[0014] According to at least one example, a method of processing video data is provided which includes receiving an encoded bit stream, wherein the stream
Petition 870190048209, of 05/23/2019, p. 25/202
8/156 encoded bits include at least two encoded video signals and one or more blocks of metadata that include one or more sets of color volume transform parameters. The method further includes determining a display region for each of the two video signals at least encoded. The method further includes determining, for each of the at least two video signals, an association between a video signal between the at least two video signals and a set of color volume transformation parameters between one or more sets of color parameters. color volume transformation. The method further includes decoding at least two video signals encoded using an associated set of color volume transform parameters, where the associated set of color volume transform parameters determine one or more display parameters for a region corresponding display
[0015] In another example, a device is provided that includes a memory configured to store video data and a processor. The processor is configured to receive and receive an encoded bit stream, wherein the encoded bit stream includes at least two encoded video signals and one or more metadata blocks that include one or more sets of color volume transform parameters . The processor is configured for and can determine a display region for each of the two video signals at least encoded. The processor is configured to and can determine, for each of the at least two encoded video signals, an association between a video signal between the at least two video signals
Petition 870190048209, of 05/23/2019, p. 26/202
9/156 encoded and a set of color volume transformation parameters between one or more sets of color volume transformation parameters. The processor is configured to and can decode at least two encoded video signals using an associated set of color volume transform parameters, where the associated set of color volume transform parameters determine one or more display parameters for a corresponding display region.
[0016] In another example, a computer-readable medium is provided having stored therein instructions that operate according to the present invention executed by a processor that performs a method that includes: receiving an encoded bit stream, in which the stream of Encoded bits include at least two encoded video signals and one or more blocks of metadata that include one or more sets of color volume transform parameters. The method further includes determining a display region for each of the two video signals at least encoded. The method further includes determining, for each of the at least two video signals, an association between a video signal between the at least two video signals and a set of color volume transformation parameters between one or more sets of color parameters. color volume transformation. The method also includes the decoding of at least two video signals encoded using an associated set of color volume transform parameters, where the associated set of color volume transform parameters determine one or more display parameters for a region corresponding display
Petition 870190048209, of 05/23/2019, p. 27/202
[0017] In another example, an apparatus is provided that includes means for receiving an encoded bit stream, wherein the encoded bit stream includes at least two encoded video signals and one or more metadata blocks that include one or more sets of parameters
of transformed volume color. 0 device still comprises means to determine a region display for each one from both signals of video fur any less coded 0 device still comprises means for to determine, for each one of the two signs of video fur
less encoded, association between a video signal between at least two encoded video signals and a set of color volume transformation parameters between one or more sets of color volume transformation parameters. The apparatus further comprises means for decoding at least two video signals encoded using an associated set of color volume transform parameters, wherein the associated set of color volume transform parameters determine one or more display parameters. for a corresponding display region.
[0018] In some respects, associations between at least two video signals and the one or more set of color volume transformation parameters are based on an order of the display regions.
[0019] In some respects, the associations between at least two video signals and one or more sets of color volume transformation parameters are based on one or more values included in the encoded bit stream.
Petition 870190048209, of 05/23/2019, p. 28/202
11/156 [0020] In some respects, a first display region for a first video signal between at least two video signals overlaps a second display region for a second video signal between at least two video signals video, and where a set of color volume transformation parameters between one or more sets of color volume transformation parameters for use in the overlap region is determined by a priority between the first display region and the second display region. exhibition. In some respects, the priority is based on an order in which the first display region and the second display region are displayed on the video frame. In some ways, the priority is based on a value provided by the video data.
[0021] In some respects, one or more blocks of metadata are encoded in one or more Network Abstraction Layer Units (NAL) of supplementary improvement information (SEI).
[0022] According to at least one example, a method of processing video data is provided which includes receiving the video data, in which the video data is associated with a color volume. The method also includes obtaining a set of color volume transformation parameters from the video data, where the color volume transformation parameter set can be used to transform the color volume. The method also includes obtaining a set of main display color volume parameters, where the set of main display color volume parameters includes values determined when generating a master copy.
Petition 870190048209, of 05/23/2019, p. 29/202
12/156 of the video data. The method also includes the generation of one or more metadata blocks for the set of color volume transformation parameters. The method further includes the generation of one or more additional metadata blocks for the main display color volume parameter set. The method further includes generating an encoded bit stream for video data, wherein the encoded bit stream includes one or more blocks of metadata and one or more additional metadata blocks, in which the inclusion of one or more blocks additional metadata is required by the presence of one or more metadata blocks in the encoded bit stream.
[0023] In another example, a device is provided that includes a memory configured to store video data that includes a color volume and a processor. The processor is configured for and can obtain a set of color volume transformation parameters from the video data, where the set of color volume transformation parameters can be used to transform the color volume. The processor is configured for and can obtain a set of main display color volume parameters, where the set of main display color volume parameters includes values determined when generating a master copy of the video data. The processor is configured for and can generate one or more metadata blocks for the color volume transform vector parameter set. The processor is configured for and can generate one or more additional metadata blocks for the main display color volume parameter set. THE
Petition 870190048209, of 05/23/2019, p. 30/202
The processor is configured for and can generate an encoded bit stream for the video data, where the encoded bit stream includes one or more metadata blocks and one or more additional metadata blocks, in which the inclusion of a or more additional metadata blocks is required by the presence of one or more metadata blocks in the encoded bit stream.
[0024] In another example, a computer-readable medium is provided having stored instructions on it that, when executed by a processor, perform a method that includes: receiving the video data, in which the video data includes a color volume. The method also includes obtaining a set of color volume transformation parameters from the video data, in which the color volume transformation parameter set can be used to transform the color volume. The method also includes obtaining a set of master display color volume parameters, where the set of master display color volume values includes values determined when generating a master copy of the video data. The method also includes the generation of one or more metadata blocks for the set of color volume transformation parameters. The method also includes the generation of one or more additional metadata blocks for the master display color volume value set. The method further includes generating an encoded bit stream for the video data, wherein the encoded bit stream includes one or more blocks of metadata and one or more blocks of additional metadata, in which the inclusion of one or more blocks additional metadata is required by
Petition 870190048209, of 05/23/2019, p. 31/202
14/156 presence of one or more blocks of metadata in the encoded bit stream.
[0025] In another example, an apparatus is provided that includes means for receiving video data, wherein the video data includes a volume of color. The apparatus further comprises means for obtaining a set of color volume transformation parameters from the video data, in which the set of color volume transformation parameters can be used to transform the color volume. The apparatus further comprises means for obtaining a set of master display color volume parameters, wherein the set of master display color volume values includes values determined when generating a master copy of the video data. The device also comprises means for the generation of one or more metadata blocks for the set of color volume transformation parameters. The device further comprises means for generating one or more additional metadata blocks for the array display color volume parameter set. The equipment further comprises means for generating an encoded bit stream for video data, wherein the encoded bit stream includes one or more metadata blocks and one or more additional metadata blocks, in which the inclusion of one or more more additional metadata blocks are required by the presence of one or more metadata blocks in the encoded bit stream.
[0026] In some respects, the set of color volume transformation parameters includes a transfer characteristic, and in which, in the bit stream
Petition 870190048209, of 05/23/2019, p. 32/202
15/156 encoded, the one or more metadata blocks excluded when the transfer characteristic does not correspond to a particular value.
[0027] In some respects, the set of color volume transformation parameters and the set of color volume parameters of the master display include the same field, and in which the field is omitted from the one or more metadata blocks based in the field being present in one or more additional metadata blocks.
[0028] In some respects, the video data includes a plurality of processing windows, and in which, in the encoded bit stream, an amount of the plurality of processing windows is restricted to a value between one and sixteen.
[0029] In some respects, video data includes a plurality of content description elements, and in which, in the encoded bit stream, an amount of the plurality of content description elements is restricted to one.
[0030] In some respects, the video data includes a plurality of target display elements, and in which, in the encoded bit stream, an amount of the plurality of target display elements is restricted to a value between one and sixteen.
[0031] In some respects, the encoded bit stream includes at least one metadata block for each access unit in the encoded bit stream, metadata block including color volume transformation parameters.
[0032] In some respects, the values
Petition 870190048209, of 05/23/2019, p. 33/202
16/156 defined as reserved are excluded from the encoded bit stream [0033] In some respects, one or more blocks of metadata each include a length value, and where, in the encoded bit stream, the length value is restricted to a multiple of eight.
[0034] In some respects, one or more blocks of metadata each include a length value, and where, in the encoded bit stream, the length value is restricted to a value between 0 and 255.
[0035] In some respects, one or more blocks of metadata are encoded in one or more Supplemental Improvement Information (SEI) Network Abstraction Layer (NAL) unit.
[0036] In some respects, one or more additional metadata blocks are encoded in one or more Network Abstraction Layer Units (NAL) for supplemental improvement information (SEI).
[0037] According to at least one example, a method of processing video data is provided which includes receiving an encoded bit stream, wherein the encoded bit stream includes one or more metadata blocks that include a set of parameters color-coded volume transform. The method further includes determining the presence of one or more blocks of metadata in the encoded bit stream. The method further includes, based on determining the presence of one or more blocks of metadata in the encoded bit stream, determining that a presence of one or more additional blocks is required in the encoded bit stream. The method also includes determining that the flow of
Petition 870190048209, of 05/23/2019, p. 34/202
17/156 encoded bits do not include one or more additional metadata blocks that include an encoded set of master display color volume parameters. The method further includes determining, based on the encoded bit stream not including one or more additional metadata blocks, that the encoded bit stream does not conform to the requirement. The method further includes not processing at least a portion of the encoded bit stream based on the determination that the encoded bit stream does not conform to the requirement.
[0038] In another example, a device is provided that includes a memory configured to store video data and a processor. The processor is configured to receive and receive an encoded bit stream, wherein the encoded bit stream includes one or more blocks of metadata that include a set of color coded volume transform parameters. The processor is
configured for and can determine The presence in one or more blocks of metadata in flow in encoded bits 0 processor is configured for and can, with base at determination gives presence of one or bad is blocks in metadata
in the encoded bit stream, determine that a presence of one or more additional blocks is required in the encoded bit stream. The processor is configured for and can determine that the encoded bit stream does not include one or more additional metadata blocks that include an encoded set of master display color volume parameters. The processor is configured to and can determine, based on the encoded bit stream not including one or more additional metadata blocks, that the encoded bit stream does not conform to the requirement. The processor is
Petition 870190048209, of 05/23/2019, p. 35/202
18/156 configured to not process at least a portion of the encoded bit stream based on the determination that the encoded bit stream does not conform to the requirement [0039] In another example, a computer readable medium is provided having stored therein instructions operating in accordance with the present invention executed by a processor which performs a method which includes: receiving an encoded bit stream, wherein the encoded bit stream includes one or more blocks of metadata that include a set of parameters of color-coded volume transform. The method further includes determining the presence of one or more blocks of metadata in the encoded bit stream. The method further includes, based on determining the presence of one or more blocks of metadata in the encoded bit stream, determining that a presence of one or more additional blocks is required in the encoded bit stream. The method further includes determining that the encoded bit stream does not include one or more additional metadata blocks that include an encoded set of master display color volume parameters. The method further includes determining, based on the encoded bit stream not including one or more additional metadata blocks, that the encoded bit stream does not conform to the requirement. The method further includes not processing at least a portion of the encoded bit stream based on the determination that the encoded bit stream does not conform to the requirement.
[0040] In another example, an apparatus is provided that includes means for receiving an encoded bit stream, wherein the encoded bit stream includes one or more blocks of metadata that include a set of
Petition 870190048209, of 05/23/2019, p. 36/202
19/156 color-coded volume transform parameters. The apparatus further comprises means for determining the presence of one or more blocks of metadata in the encoded bit stream. The apparatus further comprises means for, based on determining the presence of one or more blocks of metadata in the encoded bit stream, determining that a presence of one or more additional blocks is required in the encoded bit stream. The apparatus further comprises means for determining that the encoded bit stream does not include one or more additional metadata blocks that include an encoded set of master display color volume parameters. The apparatus further comprises means for determining, based on the encoded bit stream not including one or more additional metadata blocks, that the encoded bit stream does not conform to the requirement. The apparatus further comprises means for not processing at least a portion of the encoded bit stream based on the determination that the encoded bit stream does not conform to the requirement.
[0041] In some respects, the coded set of color volume transformation parameters includes a transfer characteristic. In these respects, the methods, apparatus, and computer-readable medium described above further comprise determining that a value of the transfer characteristic is a particular value. In these aspects, the determination that the encoded bit stream is non-conforming is also based on one or more blocks of metadata being included in the encoded bit stream when the value of the transfer characteristic is the particular value.
Petition 870190048209, of 05/23/2019, p. 37/202
20/156 [0042] In some respects, the encoded set of color volume transform parameters and the set of color volume parameters of the master display include the same field, and in which the determination that the encoded bit stream is not conformed is also based on the field being present in both, one or more blocks of metadata and one or more blocks of additional metadata.
[0043] In some respects, the coded set of color volume transform parameters and the set of color volume parameters of the master display include the same field, in which the field is omitted from one or more metadata blocks. In these aspects, the methods, devices and means that can be read by the computer described above additionally comprise the decoding of the set of color volume parameters, in which the decoding includes the use of a value for the field from the encoded set of parameters of display master color volume.
[0044] In some respects, the video data includes a plurality of processing windows, and wherein the determination that the encoded bit stream is non-conforming is additionally based on an amount of the plurality of processing windows being greater than sixteen .
[0045] In some respects, video data includes a plurality of content description elements, and wherein the determination that the encoded bit stream is non-conforming is additionally based on an amount of the plurality of content description elements being bigger than one.
Petition 870190048209, of 05/23/2019, p. 38/202
21/156
In some aspects the video data includes a plurality of target display elements, and wherein the determination that the encoded bit stream is non-conforming is additionally based on an amount of the plurality of target display elements being greater than sixteen.
In some respects the apparatus, and computer readable medium described above further comprises determining that the encoded bit stream does not include a metadata block for a particular access unit in the encoded bit stream, wherein determining that the stream encoded bit is not conformed is additionally based on the encoded bit stream not including a metadata block for the specific access unit.
In some respects the apparatus methods, and the computer-readable medium described above further comprises determining that the encoded bit stream includes a reserved value, wherein the determination that the encoded bit stream is non-conforming is additionally based on the bit stream encoded that includes a reserved value.
In some respects, one or more blocks of metadata each include a length value, and determining that the encoded bit stream does not conform is also based on the length value not being a multiple of eight.
In some respects one or more blocks of metadata each include a length value, and in which the determination that the encoded bit stream is
Petition 870190048209, of 05/23/2019, p. 39/202
22/156 non-conformed is additionally based on the length value being greater than 255.
[0051] In some respects, one or more blocks of metadata are encoded in one or more Network Abstraction Layer Units (NAL) of supplementary improvement information (SEI).
[0052] In some respects, one or more additional metadata blocks are encoded in one or more Network Abstraction Layer Units (NAL) for supplemental improvement information (SEI).
[0053] This summary is not intended to identify key or essential characteristics of the claimed subject, it is not intended to be used in isolation to determine the scope of the claimed subject. The matter in question must be understood by reference to appropriate parts of the entire specification of this patent, any or all drawings, and each claim.
[0054] The precedent, together with other characteristics and modalities, will become more apparent when referring to the specification, claims and attached drawings below.
BRIEF DESCRIPTION OF THE DRAWINGS [0055] Illustrative examples of the various implementations are described in detail below with reference to the following drawing figures:
[0056] Figure 1 is a block diagram illustrating an example of a video encoding system that includes an encoding device 104 and a decoding device.
[0057] Figure 2 illustrates the dynamic range of
Petition 870190048209, of 05/23/2019, p. 40/202
23/156 typical human vision, compared to the dynamic range of various types of display.
[0058] Figure 3 illustrates an example of a chromaticity diagram, superimposed with a triangle representing a range of SDR colors and a triangle representing a range of HDR colors.
[0059] Figure 4 illustrates an example of a process for converting high precision linear RGB video data to HDR data.
[0060] Figure 5 illustrates examples of luminance curves produced by transfer functions defined by various standards.
[0061] Figure 6 illustrates an example of processing blocks that can be used in implementations of ST 2094-10.
[0062] THE Figure 7 it is a example in one process for processing in Dice in video • [0063] THE Figure 8 it is a example in one process for processing in Dice in video • [0064] THE Figure 9 it is a example in one process for processing in Dice in video • [0065] THE Figure 10 it is a example in one process for processing in Dice in video • [0066] THE Figure 11 it is a diagram of ; blocks that
illustrates an exemplary coding device.
[0067] Figure 12 is a block diagram illustrating an exemplary decoding device.
DETAILED DESCRIPTION [0068] Certain aspects and modalities of the present invention are provided below. Some of these
Petition 870190048209, of 05/23/2019, p. 41/202
24/156 aspects and modalities can be applied independently and some of them can be applied in combination as would be apparent to those skilled in the art. In the following description, for the sake of explanation, specific details are set out in order to provide a complete understanding of the modalities of the invention. However, it will be evident that several modalities can be practiced without these specific details. The figures and description are not intended to be restrictive.
[0069] The following description provides only exemplary modalities, and is not intended to limit the scope, applicability, or configuration of the disclosure. Instead, the following description of the exemplary modalities will provide those skilled in the art with a description of training to implement an exemplary modality. It should be understood that several changes can be made in the function and arrangement of elements without departing from the spirit and scope of the invention, as set out in the attached claims.
[0070] Specific details are given in the description below to provide a complete understanding of the modalities. However, it will be understood by someone skilled in the art that the modalities can be practiced without these specific details. For example, circuits, systems, networks, processes, and other components can be shown as components in the form of a block diagram so as not to obscure the modalities in unnecessary detail. In other cases, well-known circuits, processes, algorithms, structures, and techniques can be shown without unnecessary details in order to
Petition 870190048209, of 05/23/2019, p. 42/202
25/156 avoid obscuring the modalities.
[0071] Also, it is observed that individual modalities can be described as a process that is represented as a flowchart, a flowchart, a data flowchart, a structure diagram, or a block diagram. Although a flowchart can describe operations as a sequential process, many operations can be performed in parallel or concurrently. In addition, the order of operations can be rearranged. A process is completed when its operations are completed, but it could have additional steps not included in a figure. A process can correspond to a method, a function, a procedure, a subroutine, a subprogram, etc., when a process corresponds to a function, its termination can correspond to a return of the function of the calling function or main function.
[0072] The term computer-readable medium includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other media capable of storing, containing, or carrying instructions (s) and / or data . A computer-readable medium may include a non-transitory medium in which data can be stored and which does not include carrier waves and / or transient electronic signals that propagate wirelessly or over wired connections. Examples of a non-transitory medium may include, but are not limited to, a magnetic disk or tape, optical storage medium, such as compact disk (CD) or digital versatile disk (DVD), flash memory, memory or memory devices. A readable medium by
Petition 870190048209, of 05/23/2019, p. 43/202
26/156 computer may have stored executable instructions by code and / or machine that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment can be coupled to another code segment or a hardware circuit passing and / or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. they can be passed, forwarded, or transmitted through any suitable means including memory sharing, message passing, token passing, network transmission or the like.
[0073] In addition, the modalities can be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, program code or code segments to perform the necessary tasks (for example, a computer program product) it can be stored in a computer-readable or machine-readable medium. A processor (s) can perform the necessary tasks.
[0074] More devices and systems provide consumers with the ability to consume digital video data, the need for efficient video encoding techniques becomes more important. Video encoding is necessary to reduce storage and transmission requirements
Petition 870190048209, of 05/23/2019, p. 44/202
27/156 to handle large amounts of data present in digital video data. Various video encoding techniques can be used to compress video data in a way that uses a lower bit rate while maintaining high video quality. As used herein, encoding refers to encoding and decoding. [0075] Figure 1 is a block diagram illustrating an example of a video encoding system 100 that includes an encoding device 104 and a decoding device 112. In some examples, the video encoding system 100 can be a High Dynamic Range (HDR) system, such that the encoding device 100 can receive HDR video signals and can produce a bit stream for the HDR video signals, and the decoding device 112 can decode the bit stream in the signal HDR video that can be output. The encoding device 104 may be part of a source device, and the decoding device 112 may be part of a receiving device. The source device and / or the receiving device may include an electronic device, such as a mobile or stationary telephone device (for example, cell phone, cell phone, or the like), a desktop computer, a laptop or notebook computer, a a table, a set top box, a television, a camera, a display device, a digital media player, a video game console, a video streaming device, an Internet Protocol (IP) camera, or any other device suitable electronic device. In some examples, the source device and the receiving device may include a
Petition 870190048209, of 05/23/2019, p. 45/202
28/156 or more wireless transceivers for wireless communications. The encoding techniques described here are applicable to video encoding in various multimedia applications, including streaming video streams (for example, over the Internet), broadcasting or television broadcasts, digital video encoding for storage on a media storage medium. data, decoding digital video stored on a data storage medium, or other applications. In some instances, the system 100 may support one-way or two-way video transmission to support applications such as video conferencing, video streaming, video playback, video broadcasting, game and / or video telephony.
[0076] Encoding device 104 (or encoder) can be used to encode video data using a video encoding standard or protocol to generate an encoded video bit stream. Examples of video encoding standards include ITU-T H.261, ISO / IEC MPEG -1 Visual, ITU-T H.262 or Visual ISO / IEC MPEG -2, ITU-T H.263, ISO / IEC MPEG - 4, ITU-T H.264 (also known as ISO / IEC MPEG -4 AVC), including its scalable video encoding (SVC) and Multiview video encoding extensions (MVC), and High Efficiency video encoding ( HEVC) or ITU-T H 265. Several extensions for the HEVC unit with multi-layer video encoding exist, including the track and screen content encoding extensions, 3D video encoding (3D-HEVC) and multiview extensions (MV-HEVC) and scalable extension (SHVC). HEVC and its extensions were developed by the Joint Collaboration team at
Petition 870190048209, of 05/23/2019, p. 46/202
29/156
Video Coding (JCT-VC) as well as the Joint Collaboration Team in the Development of 3D Video Coding Extension (JCT -3V) of the ITU-T Video Coding Specialists Group (VCEG) and Video Image Expert Group ISO / IEC movement (MPEG). MPEG and ITU-T VCEG have also formed a Joint Exploration Video (JVET) team to explore new encoding tools for the next generation of video encoding standards. The reference software is called JEM (joint exploration model).
[0077] Many examples described here provide examples using the JEM model, the HEVC Standard and / or their extensions. However, the techniques and systems described here may also be applicable to other coding standards, such as AVC, MPEG, extensions of the same, or other suitable coding standards that currently exist or future coding standards. Consequently, although the techniques and systems described herein can be described with reference to a particular video encoding standard, someone skilled in the art will appreciate that the description should not be interpreted to apply only to that particular standard.
[0078] With reference to Figure 1, a video source 102 can provide the video data for the encoding device 104. The video source 102 can be part of the source device, or it can be part of a device other than the source device . Video source 102 may include a video capture device (for example, a video camera, a camera phone, a
Petition 870190048209, of 05/23/2019, p. 47/202
30/156 video phone, or similar) a video file containing stored video, a video server or a content provider that provides video data, a video feed interface receiving video from a video server or provider content, a computer graphics system for generating computer graphics video data, a combination of such sources, or any other suitable video source.
[0079] The video data of the video source 102 may include one or more images or input frames. An image or frame from a video is a still image of a scene. The encoding engine 106 (or encoder) of the encoding device 104 encodes the video data to generate an encoded video bit stream. In some examples, an encoded video bit stream (or video bit stream or bit stream) is a series of one or more encoded video streams. A coded video sequence (CVS) includes a series of access units (AUs) starting with an AU that has a random access point image in the base layer and with certain properties up to and not including a Next AU that has an image of random access point in the base layer and with certain properties. For example, certain properties of a random access point image that starts a CVS may include a RASL indicator (for example, NoRaslOutputFlag) equal to 1. Otherwise, a random access point image (with RASL indicator equal to 0) does not start a CVS. An access unit (AU) includes one or more encoded images and control information corresponding to the encoded images that share the
Petition 870190048209, of 05/23/2019, p. 48/202
31/156 same exit time. Coded slices of images are encapsulated at the bit stream level in data units called network abstraction layer (NAL) units. For example, a HEVC video bit stream can include one or more CVSs including external drives. Each of the external units has an External unit header. In one example, the header is a byte for H.264 / AVC (except
for extensions in multiple layers) and two bytes for HEVC. The elements of syntax in the External unit header lead the bits des ignored and, therefore, are visible for all the types in systems and layers carriage, such
such as Transport Flow, Real Time Transport Protocol (RTP), File Format, among others.
[0080] Two classes of External units exist in the HEVC standard, including Video Encoding layer (VCL) External units and external non-VCL units. An external VCL unit includes a slice or slice segment (described below) of encoded image data, and an external unit without VCL includes control information that refers to one or more encoded images. In some cases, an External drive can be referred to as a package. A HEVC AU includes VCL External Units containing encoded image data and non-VCL external units (If any) corresponding to encoded image data.
[0081] External units may contain a sequence of bits that form an encoded representation of the video data (for example, an encoded video bit stream, a bit stream CVS, or the like), such as encoded representations of images in video. Encoder engine 106 generates encoded representations of images
Petition 870190048209, of 05/23/2019, p. 49/202
32/156 by dividing each image into multiple slices. A slice is independent of other slices, so the information on the slice is encoded without relying on data from other slices within the same image. A slice includes one or more slice segments including an independent slice segment and, if present, one or more dependent slice segments that depend on previous slice segments. The slices are then divided into coding tree blocks (CTBs) of luma samples and chroma samples. A CTB of luma samples and one or more CTBs of chroma samples, along with syntax for the samples, are referred to as a coding tree unit (CTU). CTU is the basic processing unit for HEVC coding. A CTU can be divided into multiple coding units (CUs) of varying sizes. A CU contains sets of luma and chroma samples that are referred to as coding blocks (CBs).
[0082] The luma and chroma CBs Can be divided into prediction blocks (PBs). A PB is a sample block of the luma component or a chroma component that uses the same motion parameters for inter-prediction or intra-block prediction (when available or enabled for use). The PB luma AND one or more Chroma PBs, together with associated syntax, form a prediction unit (PU). For interpretation, a set of motion parameters (for example, one or more motion vectors, reference indices or similar) is signaled
in the flow bits for each PU and is used for The interpretation PB luma it's from one or more Chroma PBs. The parameters of movement also can to be referred to as
Petition 870190048209, of 05/23/2019, p. 50/202
33/156 movement information. A CB can also be divided into one or more transform blocks (TBs). A TB represents a square block of samples of a color component in which the same two-dimensional transform is applied to encode a residual prediction signal. A transformation unit (TU) represents the TBs of luma and chroma samples and corresponding syntax elements [0083] A CU size corresponds to a coding mode size and can be of format
square. Per example, a size of a CU can to be 8 samples x 8, 16 x 16 samples, 32 samples, 64 X 64 samples, or any another dimension appropriate up until O
corresponding CTU size. The phrase N x N is used here to refer to the pixel dimensions of a video block in terms of vertical and horizontal dimensions (for example, 8 pixels x 8 pixels). The pixels in a block can be arranged in rows and columns. In some examples, blocks may not have the same number of pixels in a horizontal direction as in a vertical direction. Syntax data associated with a CU can describe, for example, the division of the CU into one or more PUs. Partition modes can differ between whether CU is coded in intra-prediction mode or in coded interpredition mode. PUs can be divided to be non-square in shape. Syntax data associated with a CU can also describe, for example, partitioning the CU into one or more TUs according to a CTU. The TU can be square or non-square in shape.
[0084] According to the HEVC standard, transformations can be performed using Transformation Units (TUs) TUs can vary for different CUs. The
Petition 870190048209, of 05/23/2019, p. 51/202
34/156
TUs can be sized based on the size of PUs within a given CU. The TUs can be the same size or smaller than the PUs. In some examples, residual samples corresponding to a CU can be subdivided into smaller units using a quadruple tree structure known as a residual quadruple tree (RQT). RQT leaf nodes can correspond to TUs. Pixel difference values associated with the TUs can be transformed to produce transform coefficients. The transform coefficients can then be quantified by the encoding motor 106 [0085] Since the images of the video data are divided into CUs, the encoding motor 106 predicts each PU using a prediction mode. The prediction unit or prediction block is then subtracted from the original video data to obtain residuals (described below). For each CU, a prediction mode can be signaled within the bit stream using syntax data. A prediction mode can include intra-prediction (or intra-image) prediction or interpretation (or inter-image prediction). Intra-prediction uses the correlation between spatially neighboring samples within an image. For example, using intra-prediction, each PU is predicted from neighboring image data in the same image using, for example, DC prediction to find an average value for the PU, planar prediction to fit a flat surface to the PU, prediction of direction to extrapolate from neighboring data, or any other suitable types of prediction. Interpredition uses the temporal correlation between images in order to derive a motion-compensated prediction for a block
Petition 870190048209, of 05/23/2019, p. 52/202
35/156 of image samples. For example, using interpretation, each PU is predicted using motion compensation prediction from the image data signal in one or more reference images (before or after the current image in order of output). The decision to encode an image area using prediction between image or intra-image can be made, for example, at the CU level.
[0086] In some examples, one or more slices of an image are assigned a type of slice. Slice types include a slice I, a slice P and a slice B. A slice I (intra-frames, independently decodable) is a slice of an image that is encoded
only by intra-prediction, θ, therefore, it is regardless decodable, already that slice I Requires just the data inside the frame for predict any
forecast unit or slice prediction block. A P slice (predicted unidirectional frames) is a slice of an image that can be coded with intra-prediction and unidirectional interpretation. Each prediction unit or prediction block within a P slice is coded with intra prediction or interpredition. When interpreting applies, the prediction unit or prediction block is only predicted by a reference image, and therefore reference samples are only from a frame's reference region. A B slice (bi-directional predictive frames) is a slice of an image that can be encoded with intra-prediction and with interpretation (for example, bi-prediction or uni-prediction). A prediction unit or prediction block for a B slice can be
Petition 870190048209, of 05/23/2019, p. 53/202
36/156 predicted bidirectionally from two reference images, where each image contributes a reference region and sets of samples from the two reference regions are weighted (for example, with equal weights or with different weights) to produce the signal of prediction of the predicted bi-directional block. As explained above, slices of an image are coded independently. In some cases, an image can be encoded as a slice.
[0087] A PU can include data (for example, motion parameters or other suitable data) related to the forecasting process. For example, when the PU is encoded using intra-prediction, the PU can include data that describes an intra-prediction mode for the PU. As another example, when the PU is encoded using interpretation, the PU can include data that defines a motion vector for the PU. The data that defines the motion vector for a PU can describe, for example, a horizontal component of the motion vector (Δχ), a vertical component of the motion vector (Ay), a resolution for the motion vector (for example, integer precision, quarter pixel precision, or eighth pixel precision), a reference image to which the vector motion points, a reference index, a reference image list (for example, list 0, list 1, or list C) for the motion vector, or any combination thereof.
[0088] The coding device 104 can then perform the transformation and quantization. For example, after prediction, encoder engine 106 can calculate
Petition 870190048209, of 05/23/2019, p. 54/202
37/156 residual values corresponding to PU. Residual values can comprise pixel difference values between the current block of pixels being encoded (the PU) and the prediction block used to predict the current block (for example, the predicted version of the current block). For example, after the generation of a prediction block (for example, the interpretation of interpredition or intra-prediction), the encoding engine 106 can generate a residual block by subtracting the prediction block produced by a prediction unit from the block current. The residual block includes a set of pixel difference values that quantify differences between pixel values from the current block values and the prediction block pixel. In some examples, the residual block can be represented in a block of two dimensional block formats (for example, a two-dimensional matrix or set of pixel values). In such examples, the residual block is a two-dimensional representation of the pixel values.
[0089] Any residual data that may remain after the prediction is performed using a block transform, which can be based on a discrete cosine transform, a discrete sine transform, an integer transform, a wave transform, another transform function appropriate, or any combination thereof. In some cases, one or more blocks transformed (for example, sizes 32 x 32, 16 x 16, 8x8, 4 x 4 or similar) can be applied to residual data in each CU. In some examples, a TU can be used for the transform and quantization processes implemented by the encoding engine 106. A given CU having one or more PUs also
Petition 870190048209, of 05/23/2019, p. 55/202
38/156 can include one or more TUs. As described in more detail below, the residual values can be transformed into transform coefficients using the block transforms, and then can be quantized and scanned using TUs to produce serial transform coefficients for entropy coding.
[0090] In some examples, following intra-predictive or interpretive encoding using the CUs of a CU, the encoding engine 106 can calculate residual data for the CU's TUs. PUs can comprise pixel data in the spatial domain (or pixel domain). The TUs can understand coefficients in the transform domain after applying a block transform. As noted earlier, residual data can correspond to pixel difference values between pixels in the uncoded image and the prediction values corresponding to the PUs. Encoder engine 106 can form TUs including residual data for CU, and can then transform TUs to produce transform coefficients for CU.
[0091] The encoding motor 106 can perform the quantification of the transform coefficients. Quantification provides additional compression by quantifying the transform coefficients to reduce the amount of data used to represent the coefficients. For example, quantification can reduce the bit depth associated with some or all of the coefficients. In one example, a coefficient with a value of n bits can be rounded up to a value of m-bit during quantization, with n being greater than m.
Petition 870190048209, of 05/23/2019, p. 56/202
39/156 [0092] Once quantization is performed, the bitstream of encoded video includes quantized transform coefficients, prediction information (eg, prediction modes, motion vectors, block vectors, or the like), partition information and any other suitable data, such as other syntax data. The different elements of the encoded video bit stream can then be entropy encoded by encoder motor 106. In some instances, encoder 106 may use a predefined scan order to scan the quantized transform coefficients to produce a serial vector that can be encoded by entropy. In some instances, encoder engine 106 may perform an adaptive scan. After scanning the quantized transform coefficients to form a vector (e.g., a one-dimensional vector), the encoding motor 106 can entropy the vector. For example, encoder engine 106 may use context-adaptive variable-length encoding, context-adaptive binary arithmetic, syntax-based context-adaptive binary arithmetic, probability interval partition entropy encoding, or other technique of appropriate entropy coding.
[0093] As previously described, a HEVC bit stream includes a group of external units including external VCL units and external units without VCL. External VCL drives Include encoded image data forming an encoded video bit stream. For example, a sequence of bits that form the stream of
Petition 870190048209, of 05/23/2019, p. 57/202
40/156 bits of encoded video is replaced in external VCL units. Non-VCL external units May contain sets of parameters with high level information regarding the bit rate of encoded video, in addition to other information. For example, a parameter set can include a video parameter set (VPS), a sequence parameter set (SPS) and an image parameter set (PPS). Examples of the goals of the parameter sets include bit rate efficiency, error resilience and the provision of system layer interfaces. Each slice references a single active PPS, SPS, and VPS to access information that the decoding device 112 can use to decode the slice. An identifier (ID) can be encoded for each parameter set, including a VPS ID, an SPS ID, and a PPS ID a PS includes an SPS ID and a VPS ID. A PPS includes a PPS ID and an SPS ID Each slice header includes a PPS ID Using the IDs, sets of active parameters can be identified for a given slice.
[0094] A PPS includes information that applies to all slices in a given image. Because of this, all slices in an image refer to the same PPS. The slices in different images can also refer to the same PPS. An SPS includes information that applies to all images in the same encoded video sequence (CVS) or bit stream. As previously described, an encoded video sequence is a series of access units (AUs) that begins with a random access point image (for example, a
Petition 870190048209, of 05/23/2019, p. 58/202
41/156 instant decoding (IDR) or broken link access (BLAA) image, or other appropriate random access point image) in the base layer and with certain properties (described above) up to and not including a Next AU that has an image random access point in the base layer and with certain properties (or the end of the bit stream). The information in an SPS may not change from image to image within an encoded video stream. Images in an encoded video stream can use the same SPS. The VPS includes information that applies to all layers within an encoded video sequence or bit stream. The VPS includes a syntax structure with syntax elements that apply to entire encoded video streams. In some examples, the VPS, PS, or PPS can be transmitted in band with the encoded bit stream. In some examples, the VPS, PS, or PPS may be transmitted out of band in a separate transmission than External units containing encoded video data.
[0095] A video bit stream can also include Supplemental Improvement (SEI) information messages. For example, an external SEI unit may be part of the video bit stream. In some cases, an SEI message may contain information that is not required by the decryption process. For example, the information in a SEI message may not be essential for the decoder to decode the bitstream video images, but the decoder can use the information to improve the display or processing of the images (for example, the decoded output) . The information in a SEI message can be
Petition 870190048209, of 05/23/2019, p. 59/202
42/156 embedded metadata. In an illustrative example, the information in a SEI message could be used by the MT decoder-side entity to improve the visibility of the content. In some cases, certain application standards may request the presence of such SEI messages in the bit stream as well as the improvement in quality can be brought to all devices that conform to the application standard (for example a SEI packaging message cart frame to 3D stereoscopic flat video format compatible with frame, where the SEI message is carried to each frame of the video, manipulation
of a message I know point of recovery, use in SEI message from rectangle Scan sweep in pan in DVB, in addition to many other examples) • [0096] THE out 110 of the device in coding 104 can submit the units External what
they compose the encoded video data via the communications link 120 to the decoding device 112 of the receiving device. The input 114 of the decoding device 112 can receive the External units. The communications link 120 may include a channel provided by a wireless network, wired network, or a combination of wired and wireless network. A wireless network can include any wireless interface or combination of wireless interfaces and can include any suitable wireless network (for example, the Internet or another wide area network, a packet-based network, WiFiTM, Radio frequency (RE ), UWB, WiFi-Directed, Cellular, Long-Term Evolution (LTE), WiMaxTM, or similar). A wired network can include any wired interface (for example, fiber, ethernet,
Petition 870190048209, of 05/23/2019, p. 60/202
43/156 power line ethernet, ethernet via coaxial cable, digital signal line (DSL), or similar). Wired and / or wireless networks can be implemented using various equipment, such as base stations, routers, access points, bridges, ports, switches or the like. The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the receiving device.
[0097] In some examples, encoding device 104 can store LT encoded video data in storage 108. Output 110 can retrieve encoded video data from encoding device 106 or from storage 108. Storage 108 can include any of a variety of data storage media distributed or accessed locally. For example, storage 108 may include a hard disk, a storage disk, flash memory, volatile or non-volatile memory, or any other digital storage media suitable for storing encoded video data.
[0098] Input 114 of the decoding device 112 receives the encoded video bit stream data and can provide the video bit stream data to the decoder mechanism 116 or to the storage 118 for later use by the decoder mechanism 116. The decoder engine 116 can decode video bit stream data encoded by entropy decoding (for example, using an entropy decoder) and extracting elements from a
Petition 870190048209, of 05/23/2019, p. 61/202
44/156 or more encoded video sequences that constitute the encoded video data. The decoder motor 116 can then reschedule and perform a reverse transform on the encoded video bit stream data. The residual data is then passed to a prediction stage of the decoder mechanism, the decoder engine 116 then predicts a block of pixels (e.g., a PU). In some examples, the prediction is added to the output of the inverse transform (the residual data).
[0099] The decoding device 112 may output the decoded video to a video destination device, which may include a display or other output device for displaying the decoded video data to a content consumer. In some aspects, the video destination device 122 may be part of the receiving device that includes the decoding device 112. In some aspects, the video destination device 122 may be part of a separate device other than the receiving device.
[00100] In some examples, the video encoding device 104 and / or the video decoding device 104 112 can be integrated with an audio encoding device and audio decoding device, respectively. The video encoding device 104 and / or the video decoding device 112 may also include other hardware or software that is required to implement the encoding techniques described above, such as one or more microprocessors, digital signal processors (DSPs) , application-specific integrated circuits
Petition 870190048209, of 05/23/2019, p. 62/202
45/156 (ASICs), field programmable port arrangements (FPGAs), discrete logic, software, hardware, firmware or any combination thereof. The video encoding device 104 and the video decoding device 112 can be integrated as part of a combined encoder / decoder (codec) respective device. An example of specific details of the encoding device 104 is described below with reference to Figure 11 an example of specific details of the decoding device 112 is described below with reference to Figure 12.
[00101] Extensions to the HEVC standard include the Multiview video encoding extension, referred to as MV-HEVC, and the Scalable video encoding extension, referred to as SHVC. The MV-HEVC and SHVC extensions share the concept of layered encoding, with different layers being included in the encoded video bitstream. Each layer in an encoded video sequence is addressed by a unique layer identifier (ID). A layer ID can be present in an External unit's header to identify a layer with which the External unit is associated. In MVHEVC, different layers generally represent different views of the same scenario in the video bit stream. In SHVC, different scalable layers are provided that represent the video bit stream in different spatial resolutions (or image resolution) or in different reconstruction organizations. Scalable layers can include a base layer (with Layer ID = 0) and one or more zoom layers (with layer IDs = 1, 2. N). The base layer can
Petition 870190048209, of 05/23/2019, p. 63/202
46/156 conform to a profile of the first version of HEVC, and represents the lowest layer available in a bit stream. The enhancement layers have increased spatial resolution, temporal resolution or frame rate, and / or reconstruction (or quality) fidelity compared to the base layer. The improvement layers are organized hierarchically and may (or may not) depend on lower layers. In some examples, different layers can be encoded using a single standard codec (for example, all layers are encoded using HEVC, SHVC, or another encoding standard). In some examples, different layers can be encoded using a multi-standard codec. For example, a base layer can be encoded using AVC, while one or more layers of magnification can be encoded using SHVC Extensions and / or MV-HEVC for the HEVC standard.
[00102] Several standards have also been defined that describe the colors in a captured video, including the contrast ratio (for example, the brightness or darkness of the pixels in the video) and the color accuracy, among other things. Color parameters can be used, for example, by a display device that is able to use color parameters to determine how to display pixels in the video. An example of an International Telecommunication Union (ITU) standard, ITU-R Recommension BT.709 (referred to here as BT.709), sets a standard for High Definition Television (HDTV). The color parameters defined by BT.709 are usually referred to as the Standard Dynamic range (SDR) and standard color range. Another exemplary standard is Recommendation ITU-R BT.2020 (referred to
Petition 870190048209, of 05/23/2019, p. 64/202
47/156 here as BT.2020) that sets a standard for High Definition Television (UHDTV). The color parameters defined by BT.2020 are commonly referred to as the High Dynamic Range (HDR) and Wide Color Range (WCG). Dynamic range and color range are referred to here collectively as color volume.
[00103] Display devices may not be able to display the color volume of a video signal that uses a high dynamic range and a wide color gamut. For example, an HDR video signal can have an absolute brightness value for each pixel. In broad daylight, the video signal can include some samples equal to 10,000 lamps per square meter (cd / m ² , often referred to as a nit). A typical High Definition Image Training (HDI) monitor, however, may only be able to display 1000 nits, while professional studio exhibitors may be able to display 4000 nits.
[00104] To allow several different types of display devices to display HDR video signals and other video signals with large volumes of color, standards have been defined for color volume transforms. Color volume transforms can be used to transform a dynamic input range and color range to a dynamic output range and color range that can be displayed by a display device. Examples of color volume transformation patterns include a set of standards defined by the Society of Motion Picture Television Engineers (SMPTE), ST 2094. within the ST 2094 Set, four documents, ST 2094-10, ST 2094-20 , ST 2094-30, and ST 2094-40, define metadata that can be used in color volume transforms. Others
Petition 870190048209, of 05/23/2019, p. 65/202
48/156 applicable standards include, for example, SMTPE ST 2084, which provides a transfer function that allows the display of HDR video content with a luminance level of up to 10,000 nits, and can be used with the color space defined by BT2020 another example of an applicable standard is SMTPE 2086, which specifies the metadata items to specify the color volume (the primary colors, white point, and luminance range) of the screen that was used in the video content master.
[00105] Among the standards mentioned above, ST 2094-10 specifies content-dependent color volume transform metadata, specialized model of the generalized color volume transform defined by ST 2094. This color volume transform is based on a curve of tone mapping defined by parameterization, the shape of which is defined both by the characteristics of image essence (algorithmically computed from the essence of input image) and possibly also by manually adjusted adjustments. This metadata is generated as part of the mastering process, that is, the production of a master copy to be used for the production of copies for distribution. The adjustment parameters can be decided as a creative adjustment.
[00106] The color volume transformation parameters defined by ST 2094-10 are provided as summary, floating point values. To distribute these parameters to a decoder, a format that provides these parameters in a more compact and efficient format is necessary. Greater efficiency can be measured, for example, in terms of the bits required to represent the values
Petition 870190048209, of 05/23/2019, p. 66/202
49/156 and / or the computational complexity necessary to determine and / or use the values. ST 2094-10, however, does not define a format for encoding color volume format parameters in a bit stream.
[00107] In various implementations, systems and methods are provided for encoding color volume transform parameters defined by ST 2094-10 in a bit stream. In some examples, a set of color volume transformation parameters can be provided with video data. In addition, a set of master display color volume values can be provided. The master display color volume parameters include values determined when generating a master copy of the video data. In some implementations, the color volume transform parameters can be encoded into a bit stream, along with video data. In these implementations, the master display color volume parameters are required to be encoded in the bit stream.
[00108] In some examples, video data may include two or more video signals, where each video signal can be displayed in a separate display region within the display area of a display device. In these examples, the video data can include sets of color volume transformation parameters for the two or more video signals. In some implementations, an encoder can determine an association between a set of color volume transformation parameters and a display region for a video signal. The association can be encoded in the bit stream,
Petition 870190048209, of 05/23/2019, p. 67/202
50/156 together with the video data.
[00109] The preceding examples can be used to implement an HDR video system, including an encoding device to produce an encoded bit stream and / or a decoding device to decode a bit stream and format the decoded video for display. By defining several restrictions on the parameters provided by ST 2094-10, an unambiguous definition of these parameters can be provided, which can simplify the implementation of HDR video systems.
[00110] Video standards that define higher color volumes try to replicate more closely what the human eye is able to see. As noted above, the color volume can include a dynamic range and a color range, where the dynamic range and color range are independent attributes of video content.
[00111] Dynamic range can be defined as the ratio between the minimum and maximum brightness of a video signal. The dynamic range can also be measured in terms of f stops. In cameras, a jump f is the ratio of the focal length of a lens to the diameter of the camera's aperture. A f-holder can correspond to a doubling of the dynamic range of a video signal. As an example, MPEG defines HDR content as content that features brightness variations of more than 16 frames. In some examples, a dynamic range between stops of 10 to 16 f is considered an intermediate dynamic range, although in other examples it is considered a dynamic range of HDR.
Petition 870190048209, of 05/23/2019, p. 68/202
51/156 [00112] Figure 2 illustrates the typical human vision dynamic range 202 compared to the dynamic range of various display types. Figure 2 illustrates a luminance scale 200, on a nits logarithmic scale (for example, on a cd / m ² logarithmic scale). For example, a star's light is approximately 0.0001 nits in the illustrated luminance range 200, and the moon's light is approximately 0.01 nits. Typical indoor light can be between 1 and 100 in the 200 luminance range. Sunlight can be between 10,000 nits and 1,000,000 nits in the 200 luminance range.
[00113] Human vision 202 is able to observe anywhere between less than 0.0001 nits to more than 1,000,000 nits, with the precise range that varies from one person to person. The dynamic range of human vision 202 includes a simultaneous dynamic range 204. The simultaneous dynamic range 204 is defined as the relationship between the highest and lowest luminance values in which objects can be detected, while the eye is in full adaptation. Complete adaptation when the eye is in a steady state after adjusting to a current ambient light condition or luminance level. Although the simultaneous dynamic range 204 is illustrated in the example in Figure 2 between about 0.1 nits and about 3200, the simultaneous dynamic range 204 can be centered at other points along the luminance range 200 and the width can vary in different luminance levels. Additionally, the simultaneous dynamic range 204 may vary from one person to another [00114] Figure 2 further illustrates a range
Petition 870190048209, of 05/23/2019, p. 69/202
52/156 approximate dynamics for SDR 206 displays and HDR 208f unit SDR 206 displays include monitors, televisions, tablet screens, smart phone screens, and other display devices that are capable of displaying HDR displays of SDR 208 video include , for example, high definition televisions and other televisions and monitors.
[00115] BT.709 provides that the dynamic range of SDR 206 displays can be about 0.1 to 100 nits, or about 10 f-stops, which is significantly less than the dynamic range of human vision 202. A dynamic range of SDR 206 displays is also less than the illustrated simultaneous dynamic range 204. SDR 206 displays are also unable to accurately reproduce night time conditions (for example, starlight, at about 0.0001 nits) or bright external conditions (for example, around 1,000,000 nits).
[00116] HDR 208 displays can cover a wider dynamic range than SDR 206 displays. For example, HDR 208 displays can have a dynamic range of about 0.01 nits to about 5600 nits or 16 f- stops. While HDR 208 displays also do not encompass the dynamic range of human vision, HDR 208 displays may come closer to being able to cover the average person's simultaneous dynamic range 204. The specifications for the dynamic range parameters for the HDR 208 displays can be found, for example, in BT.2020 and ST 2084.
[00117] Color range that describes the range of colors that are available on a particular device,
Petition 870190048209, of 05/23/2019, p. 70/202
53/156 such as a screen or a printer. Color gamut can also be referred to as the color dimension. Figure 3 illustrates an example of a chromaticity diagram 300, overlaid with a triangle representing a color range of SDR 304 and a triangle representing a color range of HDR 302. The values on curve 306 in diagram 300 are the color spectrum ; that is, the colors evoked by a single wavelength of light in the visible spectrum. The colors below curve 306 are non-spectral: the straight line between the lower points of curve 306 is referred to as the purge line, and the colors within diagram 300 are unsaturated colors that are various mixtures of a spectral color or a purple color with white. A dot marked D65 Indicates the location of white for the illustrated spectral curve 306. Curve 306 can also be referred to as the
place in spectrum or spectral location. [00118] 0 triangle representing a gamma of Colors in SDR 304 is based on red, green, and colors primary in color blue as provided by BT709. THE gamma of Colors in SDR 304 is the color space used by HDTVs, broadcasts of SDR, and other digital media content. [00119] 0 triangle representing a gamma of
HDR 302 colors are based on red, green and blue primary colors as provided by the BT2020, as shown in Figure 3, the HDR 302 color range provides about 70% more colors than the SDR 304 color range. range of colors defined by other standards, such as Digital Cinema (DCI) P3 initiatives (referred to as DCIP3) provide even more colors than the HDR 302 color range. DCI-P3 It is used for digital motion projection.
Petition 870190048209, of 05/23/2019, p. 71/202
54/156 [00120] Table 1 illustrates examples of color gamut parameters, including those provided by BT.709, BT.2020 and DCI-P3. For each color gamut definition, Table 1 provides an x and y coordinate for a chromaticity diagram.
Table 1: Color Gamma Parameters
Color White Point Colors primary Space Xw Yw x _r Yr ^X q Yq x _b Yb DCI-P3 0.314 0.351 0.68 0.32 0.265 0.69 0.15 0.06 BT.709 0.3127 0.329 0.64 0.33 0.3 0.6 0.15 0.06 BT.2020 0.3127 0.329 0.708 0.292 0.170 0.797 0. 131 0.046
[00121] Video data with a large volume of color (for example, video data with a high dynamic range and a wide range of colors) can be acquired and stored with a high degree of accuracy per component. For example, floating point values can be used to represent the luma and chroma values for each pixel. As an additional example, the 4: 4: 4 chroma format, where the luma, blue and red chroma components can be used, each of which has the same sample rate. The 4: 4: 4 notation can also be used to refer to the Red-Green-Blue (RGB) color format. As an additional example, a very wide color space, such as that defined by the International Commission on Illumination (CIE) 1931 XYZ, can be used. Video data represented with a high degree of accuracy can be almost mathematically lossless. A high-precision representation, however, can include redundancies and may not be ideal for compression. Thus, a lower precision format that aims to show the volume of color that can be seen by the human eye is often used.
Petition 870190048209, of 05/23/2019, p. 72/202
55/156 [00122] Figure 4 illustrates an example of a process 400 for converting high precision linear RGB video data to HDR 4104 data. HDR 410 data can be of lower accuracy and can be more easily compressed. Exemplary process 400 includes a non-linear transfer function 404, which can compress the dynamic range, a color conversion 406 which can produce a more compact or robust color space, and a quantization function 408 which can convert floating point representations in whole representations.
[00123] In several examples, RGB 402 linear data, which can have a high dynamic range and a floating point representation, can be compressed using the 404 nonlinear transfer function. An example of the 404 nonlinear transfer function is the quantizer perceptual defined in ST 2084. The output of the transfer function 404 can be converted into a target color space by color conversion 406. The target color space can be one that is more suitable for compression, such as Ybcr. Quantification 408 can then be used to convert the data into an integer representation [00124] The order of the steps in the exemplary process 400 is an example of the order in which the steps can be performed. In other examples, the steps may occur in a different order. For example, color conversion 406 may precede the transfer function 404. In other examples, additional processing may also occur. For example, spatial subsampling can be applied to colored components.
Petition 870190048209, of 05/23/2019, p. 73/202
56/156 [00125] The transfer function 404 can be used to map the digital values in an image to and from optical energy. Optical energy, which is also referred to as optical power, is the degree to which a lens, mirror or other optical system converges or diverges light. The 404 transfer function can be applied to the data in an image to compress the dynamic range. Compressing the dynamic range can allow video content to represent data with a limited number of bits. The 404 transfer function can be a one-dimensional, non-linear function that can reflect the reverse of the electro-optical transfer function (EOTF) of a final consumer display (for example, as specified for the SDR in the ITU-R BT Recommendation .1886 (referred to here as BT.1886) or in BT.709), or to approximate the perception of the human visual system of brightness changes (for example, as provided for HDR by the Perceptual Quantifier Transfer (PQ) function specified in ST 2084). An electro-optical transfer function describes how to link digital values, referred to as code levels or code values, in visible light. The reverse process of the electro-optical transform is the optical electro-transformation (OETF), which produces levels of code from the luminance.
[00126] Figure 5 illustrates examples of luminance curves produced by transfer functions defined by various standards. Each curve shows a luminance value at different code levels. Figure 5 also illustrates a dynamic range enabled by each transfer function. In other examples, curves can be drawn separately for color components
Petition 870190048209, of 05/23/2019, p. 74/202
57/156 red (R), green (G) and blue (B).
[00127] A reference electro-optical transfer function is specified in BT1886. The transfer function is given by the following equation:
[00128] In the above equation:
[00129] L is the screen luminance in cd / m ² [00130] LW is the screen luminance for white;
[00131] LB is the screen luminance for black [00132] V is the input video signal level (normalized, such that black occurs at V = 0 and white at V = 1). For the content provider
By BT.709, the 10-bit D digital code values map D to V values for the following equation: V = (D-64) / 876;
[00133] γ is an exponent of power function, where γ = 2.404 [00134] a is a variable for user gain (legacy contrast control), where:
[00135] and b is a variable for raising the user's black level (legacy brightness control), where:
[00136] The variables a and b above can be derived by solving the following equations, so that V =
Petition 870190048209, of 05/23/2019, p. 75/202
58/156
L gives L = LW and such that V = 0 gives L = LB:
[00137]
ST 2084 provides a transfer function that can more efficiently support larger dynamic range data. The transfer function of ST 2084 is applied to the values of R, G that produces non-linear and normalized B representations
linear, R ', G' and B ' . THE ST 2084 define still The normalization of NORM = 10000, which is associated with a glow peak of 10,000 nits. The values res R ’, G’ and B ’ can to be calculated as follows: R '= RO TF (max (0, min (R / NORM, 1))) G '= PQ_TF (max (0, min (G / NORM, 1))) B '= PO TF (max (0, min (B / NORM, 1,))) [00138] On Equation (1), the Function in transfer, PQ TF, is defined as follows: j ,. rSjx * .r J. s MW 1"40% 4" W5W1757Í MM 28 ss 78-8437 wl-. 937.5
40%
The electro-optical transfer function can be defined as a function with a point precision
Petition 870190048209, of 05/23/2019, p. 76/202
Floating 59/156. With floating point precision, it is possible to avoid introducing errors in a signal that incorporates the nonlinearity of the function when an optical transfer function is applied. This reverse transfer function specified by ST 2048 is as follows:
R = 10000 * inversePQ_TF (R ')
G = 10000 * inversePQ_TF (G ') (2)
B = 10000 * inversePQ_TF (B ') [00140] In Equation (2), the inverse, inverse transfer function is defined as follows:
^ 52
- ——x W 3424
W íq '«>> 4-1 as ------------ -
2413 pj »x 32 ™ MSSlSfôS [00141] Other transfer functions and reverse transfer functions have been defined. A video encoding system can use one of these other transfer functions and reverse transfer functions instead of or in addition to those provided by ST 2084.
[00142] Color conversion 406 can reduce the size of the color space of the linear RGB input 402.
Petition 870190048209, of 05/23/2019, p. 77/202
60/156
Image capture systems often capture images as RGB data. The RGB color space, however, can have a high degree of redundancy between color components. RGB is therefore not ideal for producing a compact representation of the data. To obtain a more compact and more robust representation, RGB components can be converted into a more uncorrelated color space, such as Ybcr, which may be more suitable for compression. The target color space produced by color conversion 406 can separate brightness, represented by luminance, and color information in different, unrelated components.
[00143] The YCbCr color space is a target color space used by BT709 BT.709 provides the following conversion to the R 'non-linear, G' and B 'space for a non-constant luminance representation, Y', Cb and Cr:
[00144] The conversion provided by Equation (3) can also be implemented using the following approximate conversion, which avoids the division of the components Cb and Cr:
¥ '- 0.312000 * M' v * G 'r Ô.072300 * o - <i i4to * * G ^! t o. »oo» * b
Cr ~ S JOBW * R * -O.4S4I53 * G * - 0.04 W7 * B ’[00145] Ο BT.2020 specifies the process
Next conversion from R ', G' and B 'to Y, Cb and Cr:
Petition 870190048209, of 05/23/2019, p. 78/202
61/156 ¥ '™ D.2O7 * R ⁵ r Ô.O * <F 4 0.0593 * B' [00146] The conversion provided by Equation (5) can also be implemented using the following approximate conversion, which avoids the division of components
Cb and Cr:
¥ * «= 9.262760 ® R '4- Q..67W0 * G ^:: 4 9.959300 *
Cb 4), 139639 * R. ' “9/360370 ^s O '4 (1.300090 * B
The · »ô Jòòwà * R * - (UW86 CF - 0.040214 * [00147] After the 406 color conversion, the input data, now in the target color space, can still be represented with a high bit depth ( for example, with floating point precision.) Quantification 408 can convert the data to a target bit depth. In some examples, the precision from 10 bits to 12 bits, in combination with the PQ transfer function, may be sufficient to HDR 410 data has 16 f-stops with a distortion that is just below what is noticeable by human vision HDR 410 data with 10-bit precision can also be encoded by most video encoding systems. losses, meaning that some information is lost, and can be a source of inaccuracy in the HDR 410 data issued by process 400.
[00148] The following equations provide an example of quantification 408 that can be applied to code words in the target color space. For example, input values for Y, Cb and Cr that have floating point precision can be converted to
Petition 870190048209, of 05/23/2019, p. 79/202
62/156 BitDepthy fixed bit depth for Y value and BitDepthC for chroma values (Cb and Cr).
An example of such quantification can be applied to code words in the target color space, such as Ybcr as shown in the following example. For example, the input values Ybcr represented in floating point precision are converted into a fixed bit signal. Bits of depth for the Y value and bits of bits for the Chroma values (Cb, Cr).
-8)) * (219 4 l))) ** (fesasl {(1. '™ S)) * (124 * Cb + w)) Ó)
Do, Clipl-c ((1 - Ο φ (224 * Cr 4-128))) [00149] Above:
Round (x) = Sign (x) * Floor (Abs (x) + 0.5)
Sign (x) = -1 if x <0, 0 if x = 0, l if x> 0
Floor (x) for the largest integer less than or equal to x
Abs (x) = x if x> = 0, -x if x <0
CliplY (x) = Clip3 (0, (1 «BitDepthY) - 1, x)
CliplC (x) = Clip3 (0, (1 «BitDepthC) -l, x)
Clip3 (x, y, z) = x if z <y, y if z> y, z otherwise [00150] The HDR 410 data produced by the exemplary process 400 can be compressed or encoded by an encoder, for example, using the AVC HEVC standards, or VP8 / VP9 / VP10, to produce an encoded bit stream. A bit stream can be stored and / or transmitted. The bit stream can be decompressed or decoded by a decoder to produce a
Petition 870190048209, of 05/23/2019, p. 80/202
63/156 uncompressed video.
[00151] The uncompressed video signal can be transmitted to a final consumer device using, for example, a high speed digital interface. Examples of consumer electronic devices and transmission media include digital televisions, digital cable, satellite or terrestrial-mounted boxes, mobile devices and related peripheral devices, such as digital versatile discs (DVD) players and / or recorders, and other decoding devices consumer devices.
[00152] Protocols, requirements and recommendations for high-speed digital interfaces are defined in specifications produced by the Digital Television Subcommittee (DTV) of the Consumer Electronics Association (CTA), such as CTA-861. Examples of protocols, requirements, and recommendations defined by CTA-861 include video formats and waveforms; colorimetry and quantization; transport of compressed and uncompressed video data, as well as Linear Pulse Code Modulation (LPCM) audio; auxiliary data transport; and implementations of the Electronic Video Standards Association (VESA) Enhanced Extended Display Identification Data Standard (EEDID), which is used by consumer devices to declare display capabilities and characteristics.
[00153] The CTA 861-G version of the CTA-861 specification includes an extended InfoFrame Data Structure, which can carry larger amounts of dynamic metadata. The dynamics, in this context, means that the data may vary over time, that is, across
Petition 870190048209, of 05/23/2019, p. 81/202
64/156 of the time. The data carried in the extended InfoFrame data structure can be used by an end device, such as a display, television or other device that can process a video signal, such as a decoder or receiver. The data can be used, for example, for intelligent processing, guided mapping, display adaptation, and color volume transformation applicable to the final device. An extended InfoFrame can have a type that is specified by a 2-byte number. When the value of type Extended InfoFrame is set to 0x0001, 0x0002, 0x0003, or 0x0004, the Extended InfoFrame holds Dynamic HDR metadata. The Dynamic HDR metadata InfoFrame Contains the Dynamic HDR INT Metadata which can be encoded in Supplemental Magnification Information (SEI) messages in an encoded bit stream. SEI messages can be used in AVC, HEVC and VP8 / VP9 / VP10 streams, as well as bit streams produced according to other standards.
[00154] A decoder can support the transmission of certain types of extended Info frames with Dynamic HDR Metadata. The decoder can additionally determine whether a target end device is capable of receiving Dynamic Metadata From Dynamic Metadata HDR, and, if so, can send the Info frames with associated video encoded according to the type of info frames. In some examples, a decoder will not send an InfoFrame extended by Dynamic HDR Metadata That IS of type 0x0001, 0x0002, 0x0003, or 0x004 to an end device that does not indicate support for that type of extended InfoFrame. The end device can, for example,
Petition 870190048209, of 05/23/2019, p. 82/202
65/156 use of a Dynamic HDR Metadata Data Block to indicate the types of Extended HDR Dynamic Metadata for the end device supports.
[00155] Communication between a final device and a decoder can be conducted using Extended Display Identification (EDID) data. EDID is a data structure provided by an end device to describe the capabilities of the end device. For example, EDID can describe video formats that the end device is capable of receiving and rendering. An end device can provide the EDID to a decoder at the order of the decoder. The decoder can select an output format based on the information provided by EDID, taking into account the format of an input bit stream and formats supported by the final device.
[00156] In several examples, several blocks of data can be used to specify parameters that describe the display capabilities of an end device. Examples of such data blocks include a Colorimetric Data block, an HDR Static Metadata Block, an HDR Dynamic Metadata Data Block, and other data blocks. The Colorimetry Data block can indicate colorimetry patterns and color gamut patterns, such as BT.2020 or DCI-P3, supported by an end device. The HDR Data Block indicates the HDR capabilities of the end device through parameters such as parameters that describe EOTF characteristics of the screen (for example, BT.1886, ST 2084, or others), parameters that describe a desired dynamic range (for example , a desired minimum and / or maximum luminance) and / or
Petition 870190048209, of 05/23/2019, p. 83/202
66/156 parameters that describe a desired maximum average luminance for optimal rendering of content on the display. HDR Dynamic Metadata Data Block indicates type and type of HDR Dynamic Metadata type supported.
[00157] As noted above, SMTPE ST 2094 specifies four different color volume transforms, each published in a separate document. These documents are designated ST 2094-10, ST 2094-20, ST 2094-30 and ST 2094-40.
[00158] ST 2094-10 describes dynamic HDR metadata, where dynamic may mean that the color volume transform may depend on the video content. For example, ST 2094-10 defines a parametric tone mapping function. ST 2094-10 further specifies that tone mapping can be performed in various color spaces, including Cbcr, RGB and color spaces based on the human visual system. ST 2094-10 also provides a mathematical description of an exemplary color volume transform from RGB Input.
[00159] Figure 6 illustrates an example of 610 processing blocks that can be used in ST 2094-10 implementations. Process blocks 610 are illustrated within a framework for a generalized color volume transform model provided by ST 2094-1. In this structure, an image 602, which can be a video frame or a part of a video frame, can be subjected to an input conversion 604, if necessary. Input conversion 604 can convert the color space of image 602 into an input color space. After the 610 processing blocks have operated
Petition 870190048209, of 05/23/2019, p. 84/202
67/156 in the image, a 606 output conversion can be applied, if necessary, to convert the color space of the image into an output color space. The result of the global process is a transformed image 608.
[00160] Process blocks 610 of ST 2094-10 implementations include a tone mapping block 612, a color gamut adjustment block 614, a detail management block 616.
[00161] The parameters and operating process blocks 610 can be described as follows; in the description that follows, PQ means perceptual quantizer:
[00162] Tone mapping based on MaxRGB:
[00163] For tone mapping based on maxRGB, the following parameters are defined:
[00164] MinimumPqencodedMaxrgb - Minimum maxRGB value encoded by PQ of the reduced pixel set [00165] AveragePqencodedMaxrgb - average of maxRGB encoded PQ values of the reduced pixel set [00166] MaximumPqencodedMaxrgb - Maximum maxRGB value encoded by reduced PQ of the pixel set 00167] MinimumPqencodedMaxrgbOffset - an offset in the same unit as Minimumpqcodificdedmaxrgb to be added to the Minimum value of maxqpsmaxrgb [00168] AveragePqencodedMaxrgbOffset - an offset in the same unit AveragePqencodedMaxrgb to be added to the value AveragePqencodedMaxrgb to be added to the MaximumPqencodedMaxrgb value [Maximum]
Petition 870190048209, of 05/23/2019, p. 85/202
68/156
Maximumpqcodificdedmaxrgb to be added to the value
MaximumPqencodedMaxrgb [00170] Tone Mapping based on Gain, Offset and Gamma [00171] Equation 8 below defines a tone mapping function for tone mapping based on Gain, Gain and Gamma:
Y = (min (max (0, (xxg) +0)) P (8) [00172] In Equation 8, y = output value; x = input value; g = tone mapping gain value; 0 = Tone Mapping Offset value, and P = Gamma Tone Mapping value.
[00173] The following HDR parameters can be signaled (for example, provided and / or encoded in a bit stream) for use in equation 8:
[00174] ToneMappingOffset - the offset of tone mapping used in equation (8).
[00175] ToneMappingGain - gain of tone mapping used in equation (8).
[00176] ToneMappingGamma - the mapping of gamma tones used in equation (8).
[00177] The following parameters are also defined in ST 2094-10:
[00178] ChromaCompensationWeight is an amount chroma adjustment [00179] SaturationGain is a amount in adjustment of saturation [00180] ToneDetailFactor is a parameter what
controls the contribution of the detail management function to the result of tone mapping
Petition 870190048209, of 05/23/2019, p. 86/202
69/156 [00181] The following restrictions are also defined by ST 2094-10. ST 2094-10 specifies the scope of HDR Metadata as such that the metadata must contain exactly one of each of the following:
[00182] Timelnterval Information defined through of parameters specified in ST 2094-1; includes:[00183] TimeIntervalStart [00184] TimeltervalDuration [00185] Processingwindow Info defined through of parameters specified in ST 2094-1; includes:[00186] UpperLeftCorner [00187] LowerRightCorner [00188] WindowNumber [00189] TargetedSystemDisplay information defined through parameters specified in ST 2094-1; includes: [00190] TargetedSYstemDisplayPrimaries[00191] TargetedSystemDisplayWhitePointChromati city [00192] TargetedSYstemDisplayMaximumLuminance[00193] TargetedSYstemDisplayMinimumLuminance[00194] Transformation parameters volume by heart: [00195] ImageCharacteristicsLayer, that should to contain exactly one of each of the items named the follow: [00196] MinimumPqencodedMaxrgb [00197] AveragePqencodedMaxrgb [00198] MaximumPqencodedMaxrgb [00199] ManualAdjustementLayer, which can contain
Petition 870190048209, of 05/23/2019, p. 87/202
70/156 any combination having zero or one of each of the following named items:
[00200] MinimumPqencodedMaxrgbOffset [00201] AveragePqencodedMaxrgbOffset [00202] MaximumPqencodedMaxrgbOffset [00203] ToneMappingOffset [00204] ToneMappingGain [00205] ToneMappingGamma [00206] ToneMapain [2020] ChroaCompensation [2020] can be used for color volume transforms. The standard also provides examples of information that illustrate how HDR parameters can be used. However, the ST 2094 set of standards does not define the way in which parameters are to be signaled (for example, provided in an encoded bit stream). Standards Development Organizations (SDOs), such as the European Telecommunications Standards Institute (ETSI), CTA, and MPEG are expected to develop standards for transmitting HDR parameters in an encoded bit stream. For example, CTA-861-G is a standard that specifies a format for transmitting dynamic HDR metadata defined in ST 2094 over digital audio / video interfaces.
[00210] In HEVC and AVC streams, among others, the HDR parameters can be provided Using SEI messages. A format for encoding the parameters defined by ST 2094-20 in SEI messages is defined, for example, in
Petition 870190048209, of 05/23/2019, p. 88/202
71/156
Technical Specification ETSI 103 433. The parameters defined by ST 2094-30 can be coded, for example, in
Color Remapping Information KNOW HEVC or stroke. [00211] An example of an I know message to signal parameters defined per ST 2094-10 in HEVC was proposed to be added The Collaborative team Joint In coding of video (JCTVC) JCTVC-X004, but
this proposal was not adopted.
[00212] For specific systems, standards bodies such as Advanced Television systems Committee (ATSC) and Digital Video Broadcasting (DVB) can define a format for the ST 2094-10 signaling parameters. For example, parameters can be encoded in registered user data SEI messages that can be added to an encoded bit stream. Such SEI messages can be optimized for the HDR video system for which standard bodies provide specifications. In addition, SEI messages can be defined in such a way that SEI messages unambiguously define an implementation for ST 2094-10 parameters for a specific HDR video system.
[00213] Systems and methods are provided for a standardized Mechanism for encoding ST 2094-10 metadata in encoded bit streams. The techniques described below can provide an unambiguous definition for receiving and analyzing ST 2094-10 on an HDR video system. The techniques can also reduce the complexity of implementing receivers. Techniques include color space signaling and information that can be used to convert color space, restrict
Petition 870190048209, of 05/23/2019, p. 89/202
72/156 number of ST 2094 processing elements that would be used by an HDR video system, the use of ST 2094-10 reserved value restrictions that would be used, restricting mandatory supplementary information signaling, target signaling display capabilities such as color gamut and minimum luminance, encoding and block length (for example, the field indicating the length of the metadata block) with fixed length encoding, and specification of associations between ST 2094 processing elements that would be used by HDR video system, among other things.
[00214] These techniques provide a standardized signaling mechanism where none is defined by ST 2094-10. These techniques can also define a complete video system by filling spaces not defined by ST 2094-10, such is the way in which HDR parameters are to be used, an input description, and an output color volume transform for use, among other things. The incorporation of ST 2094-10 elements in SEI Messages is described, as well as restrictions on the use of these elements.
[00215] In a first example, the color space and the conversion can be signaled for an HDR video system that implements ST 2094-10. As discussed above, ST 2094-10 defines metadata that can indicate the color space of an incoming video signal and metadata that can be used to convert the input color space to another color space. Examples of input and working color spaces may include Ybcr and ICtCp (where I is the luma component and Ct Ct and Cp are the chroma components
Petition 870190048209, of 05/23/2019, p. 90/202
73/156 red-yellow and red-green, respectively), which is defined by ITU-T Recommendation BT.2100 In some examples, exact color transform matrices for converting an input color space to a color space of and to convert the working color space to an output color space. In these examples, the working color space is one in which a tone mapping function can be applied.
[00216] In some examples, a decoder or receiver of an HDR video system may select color transform matrix from sets of matrices provided in an encoded bit stream. The decoder can, for example, use the values of the HEVC Video Usability Information (VUI) parameters, primary_colors and / or matrix coefs, to select a color transform matrix.
[00217] In some examples, restrictions can be defined for the parameters of the color transform matrices. These restrictions can simplify the implementation of receivers. For example, the inputs of a color transform matrix can be limited to the values defined by the Ybcr or ICtCp color spaces.
[00218] In some examples, a flag can be used to indicate whether a particular color transform and any related deviations are present in a bit stream. For example, an indicator called RGBtoLMS_coef_present_flag can indicate whether RGB Color Space Conversion Parameters for LMS are available in the bit stream (LMS is a color space that represents the response of the three types of eye cones
Petition 870190048209, of 05/23/2019, p. 91/202
74/156 human, and is named for peaks of responsiveness or sensitivity at long, medium and short wavelengths). As another example, an indicator called Yctorgb_coef_present_flag can indicate whether the bit stream includes parameters that can be used to perform a YCbCr to RGB color conversion. In some cases, a value of 1 for any of these flags may indicate that the color conversion parameters and any related deviations are present and a value of 0 may indicate that the color version parameters are not available in the bit stream.
[00219] In some examples, when the color transformation parameters are not present, the values of the coefficients are inferred as those of the identity matrix. In some examples, when the color transformation parameters are not present, the parameter values are inferred as zero. In some examples, other default values for color transform parameters and deviations are possible.
[00220] In a second example, the number of ST 2094-10 processing elements that are used by an HDR video system can be restricted. Restricting the number of processing elements can both provide an unambiguous definition for receiving and analyzing ST 2094-10 parameters, and simplify the implementation of an HDR receiver. For example, ST 2094-10 does not specify a number of extended blocks that can be included in a bit stream, where extended blocks can include processing elements. Having an indefinite number of blocks extended may mean that the amount of
Petition 870190048209, of 05/23/2019, p. 92/202
75/156 memory that a decoder needs to store the blocks, and the amount of processing resources needed to process the blocks may be unknown. Thus, in several examples, the number of extended blocks can be restricted, so that decoders can determine the memory and processing resources needed to process the blocks. ST 2094-10 processing elements can include processed image fragments, processing windows, content description elements, target screen description elements and tone mapping templates, among other things. The following examples can be used individually or in any suitable combination.
[00221] Processed image fragments and processing windows (referred to as Processwindow by ST 2094) describe parts of a display. For example, a screen can include multiple windows, possibly overlapping, where each window can display a different video signal. An example of multiple windows on the same screen is picture by picture, where a window inserted into the display may include a different video signal than the video signal being output to the main part of the display. In some instances, the number of image fragments processed and processing windows in a video signal is limited to a fixed number less than 254. For example, the number can be set equal to a value from 1 to 16. As provided by ST 2094, ext_block_level field for processed image fragments and processing windows is set to 5. According to the restriction on the number of processed image fragments and the image windows
Petition 870190048209, of 05/23/2019, p. 93/202
76/156 processing, as another example, the number of extension blocks with an ext_block_level of 5 can be restricted to one.
[00222] Content description elements (referred to as Imagecharcocadamada by ST 2094) can provide information about a particular video signal. In some examples, the number of content description elements is set to 1. content description element blocks have an ext_block_level value of 1. In some examples, the number of extension blocks with ext_block_level equal to 1 can be restricted to one.
[00223] Target screen description elements (referred to by ST 2094 as TargetedSystemDisplay) can provide information about a display device that a video signal can be displayed on. In some examples, the number of target display description elements is a value in the range 1 to 16. Extension blocks for target display description elements have an ext_block_level value of 2 in some examples, the number of extension blocks with an ext_block_level of 2 it can be restricted to less than or equal to 16.
[00224] Tone mapping templates can be used to map a set of colors to a second set of colors. For example, the second set of colors can approximate the appearance of HDR images on a system that has a more limited dynamic range. In some examples, for an HDR system that implements ST 2094-10, the number of tone mapping models (referred to as ColorVolumeTransform in ST 2094) is a value from 1 to 16.
[00225] In some examples, the number of
Petition 870190048209, of 05/23/2019, p. 94/202
77/156 SEI messages signaling ST 2094-10 related information may not exceed two for each coded frame or access unit. In some instances, each access unit will have a SEI message associated with Metadata ST 2094-10. In some instances, when such a SEI message is
gift, there will be only one unity by access • [00226] On a third i example, the use in values that are reserved in ST 2094-10 is restricted . THE limitation of use of reserved values can ensure what
no unspecified or unauthorized information is included in a bit stream.
[00227] In some examples, bit streams that comply with the current version of ST 2094-10 should not include reserved values. For example, for extension blocks, some values for ext_block_level are reserved for use by ATSC. In these examples, these reserved values cannot be used in an HDR video system that implements ST 2094-10. Alternatively, in some examples, extension blocks that use a value reserved for ext_block_level will be ignored by a decoder.
[00228] In some examples, a decoder will discard ST 2094-10 SEI messages that contain reserved values. In some cases, when an ext_block_level value for SEI messages is a value other than 1, 2 or 5, the SEI message must be discarded. In some cases, when an ext_block_level value equals a reserved value, the SEI message should be discarded. Such examples can prevent a hole, in which arbitrary data of any size can be inserted into the SEI message.
Petition 870190048209, of 05/23/2019, p. 95/202
78/156
[00229] In some examples, values in ext_block_level other than 1, 2 or 5 are reserved for future use by ATSC. [00230] In some examples, values in
ext_block_level other than 1, 2, or 5 are not allowed to be used in a system based on the ATSC standard.
[00231] In a fourth example, restrictions can be placed on mandatory supplementary information. The ATSC specification provides a toolbox that can be used to signal information about a video signal. This toolbox allows a bit stream to encode multiple transfer functions, including the transfer functions defined by BR.709, BT.2020 and Hybrid Gamma (HPG) system among others. Restricting the number of combinations, however, can simplify the implementation of a decoder. In some examples, SEI messages with a value of 4 for the payloadType can be used to convey the characteristics of different transfer functions.
[00232] In some examples, an ST 2094-10 SEI message will not be present when the transfer characteristics syntax element in HEVC VUI is not equal to 16 (for the ST 2084 PQ transfer function).
[00233] In some examples, the message ST 209410 SEI will not be present when the transfer characteristic syntax element 23T in HEVC VUI is not equal to 16 (for the ST2084 PQ transfer function) or 18 (for the transfer function HPG transfer).
Petition 870190048209, of 05/23/2019, p. 96/202
79/156 [00234] In some examples, a SEI message with master display color volume metadata, as defined by ST 2086, should be included in a bit stream that has a SEI message with ST 2094 Parameters. In these examples, the syntax elements in the ST 2094-10 SEI message that carry the same information as the SEI message for ST 2086 can be removed from the SEI message for the ST 2094-10 parameters. In these examples, the corresponding information needed to derive the ST 2094-10 frame processing can be extracted from the ST 2086 SEI message.
[00235] In some examples, when a ST 2086 SEI message is present in the bit stream, the syntax elements that are common between ST 2086 and ST 2094-10 are not signaled in the SEI message for ST 2094-10. Instead, the ST 2094-10 syntax elements can be inferred to be the same as the corresponding syntax elements in the ST 2086 SEI message.
[00236] In some examples, an indicator (called, for example, st2086_info_present_flag) can
be used to indicate if the elements of syntax that are common among ST 2086 and ST 2094-10 are placarded at I know message to ST 2094-10. [00237] In some examples, elements in
ST 2086 syntax are included in the SEI message for ST 209410. In these examples, for bit bits or access units with an ST 2094-10 SEI message present, ST 2086 SEI messages are not allowed.
[00238] In some examples, when the matrix display information is ST 2086 SEI message conflicts with
Petition 870190048209, of 05/23/2019, p. 97/202
80/156 display information in a ST 2094-10 SEI message, the ST 2094-10 information takes precedence for processing a frame.
[00239] In a fifth example, target display capabilities, such as color gamut and minimum luminance, can be signaled or indicated in a bit stream. The target display capabilities may indicate minimum requirements for a display device to be able to display a video signal encoded in a bit stream.
[00240] In some examples, an extension block that has a value of 2 for ext_block_level may include a minimum target display luminance, primary targets, and a target white point.
[00241] In some examples, an additional block type can alternatively or additionally be added. Extension blocks with this type can contain the primary, the white point and the minimum luminance of the target display. For example, an extension block with a
ext_block_ .level equal to 3, which would be other way reserved, it could be used. [00242] In one sixth example, the field of
ext block length for extension blocks can be encoded using a fixed length. The ext block length field can indicate the size of an extension block. For example, when the ext_block_level for an extension block is set to 1, the corresponding ext block length can be set to 5. As another example, when ext_block_level is set to 2, the length of the ext block can be adjusted in 11. As a
Petition 870190048209, of 05/23/2019, p. 98/202
81/156 another example, when the level of the ext block is set to 5, the length of the ext block can be equal to 7. Limiting the number of bits that can be used for the length of the ext block can simplify the implementation of a decoder .
[00243] In some examples, the number of bits used to encode the block length of the 26T syntax element is chosen to be a fixed multiple of 8, or another suitable multiple.
[00244] In some examples, the range of values for the ext block length value is restricted to be between 0 and 255, including [00245] In some examples, a constraint may instead be placed on the number of times the d dm alignment zero alignment syntax element is indicated in an ext_dm_data_block_load () data structure. For example, the number of times that the syntax element appears to be limited to less than 7. The ext_dm_data_block_payload () data structure can be used to indicate different sets of parameters. For example, when ext_block_level is equal to 1, ext_dm_data_block_payload () the data structure can provide content range values, such as minimum, maximum and Average PQ values. As another example, when ext_block_level is 2, ext_dm_data_block_payload () can include cut-off values, such as slope, displacement, power, saturation weight and saturation gain, among other things. As another example, when ext_block_level is 5, ext_dm_data_block_payload () can describe an active area, also referred to here as
Petition 870190048209, of 05/23/2019, p. 99/202
82/156 a display region. The ext_dm_data_block_payload () can also include a number of ext_dm_data_block_payload () elements, which can set the size of the data structure to a particular size.
[00246] In a seventh example, the associations between ST 2094 processing elements can be specified.
[00247] As noted above, the ext_dm_data_block_payload () data structure can provide information related to color gamut mapping parameters and scenario parameters. For example, one or more ext_dm_data_block_payload () data structures may include a set of color volume transformation parameters, which can be used by a decoder or receiver to transform a video signal into one that can be displayed by a device in private. In some examples, a specification can be provided to associate the system's color volume transform parameters in ext_dm_data_block_payload () data structures with active regions on a display. In some examples, the video that is displayed on a device may have more than one display region, where each region can output a different video signal. In these examples, more than one video signal can be encoded in a bit stream. Each video signal can be associated with a set of color volume transformation parameters. In some cases, two video signals can be associated with the same set of color volume transform parameters. Several techniques can be used to determine which region activates a set of parameters
Petition 870190048209, of 05/23/2019, p. 100/202
83/156 color volume transformation is associated.
[00248] In some examples, an index can be used to indicate an association between a set of color volume parameters and a display region. For example, for each ext_dm_data_block_payload () that does not indicate a display region to which the information in the data structure applies (for example, the level of the ext block is equal to 1 or 2), the syntax element can be used to indicate an association. For example, a SEI message can include a syntax element in the form of a list of indexes, where the order of the indexes corresponds to the order in which the ext_dm_data_block_payload () data structure appears in a bit stream. In this example, index values can indicate one or more display regions that each ext_dm_data_block_payload () is associated with. As another example, active display regions can be indicated in a SEI message. In this example, a syntax element in the SEI message can indicate an active display region with which an ext_dm_data_block_block payload is associated. In this example, the active display regions can be identified in the order in which the active display regions are indicated in the SEI message, or each active display region can have an identifier.
[00249] In some examples, the association of color volume transformation parameters provided by ext_dm_data_block_payload () data structure with display regions can be based on the order in which ext_dm_data_block_payload () the data and / or display regions appear in a bit stream. For example, restrictions can be placed in the order in which different
Petition 870190048209, of 05/23/2019, p. 101/202
84/156 ext_dm_data_block_payload () data structure types appear in a bit stream. The type of an ext_dm_data_block_payload () can be indicated by the ext_block_level syntax element. In this example, the order in which ext_dm_data_block_payload () data structures appear as a display region with which the data structures are associated.
[00250] As an example of restricting the order of data structures ext_dm_data_block_payload (), to any value of i in the range of 0 to a block of blocks-1 (num_ext_blocks indicates the total number of extension blocks), where ext_dm_data_block_load (i ) indicates parameters for color gamma mapping, if there is any value j in the range of 0 to in a block-1, inclusive, ai that j is the smallest number that is greater than i for which ext_dm_data_block_load (j) contains information about one or more active regions, and there is a k that is greater than j just as k is the smallest number greater than j for which ext_dm_data_block_load (k) indicates parameters for color gamut mapping, so ext_dm_data_block_load (i) is associated with regions indicated by ext_dm_data_block_load (m) for m in the range from ja k-1, inclusive. Alternatively or additionally, if there is any value j in the range of 0 to in a block-1, inclusive, such that j is the smallest number that is greater than i for which ext_dm_data_block_load (j) contains information about one or more active regions, and there is no value of k that is greater than j such that ext_dm_data_block_load (k) indicates parameters for color gamut mapping, so ext_dm_data_block_load (i) is associated with indicated regions
Petition 870190048209, of 05/23/2019, p. 102/202
85/156 by ext_dm_data_block_load (m) for m in the range of blocks from j to num-1, inclusive. Alternatively or additionally, ext_dm_data_block_load (i) applies to the entire image.
[00251] As another example of restricting the order of data structure ext_dm_data_block_payload (), parameters for color gamut mapping may include one or more syntax elements that do not indicate a region for application of color gamut mapping.
[00252] In some examples, the association of color volume transformation parameters with display regions can be based on block association. For example, an ext_dm_data_block_payload () can be included in a bit stream that has a particular ext_block_level value (for example, 6 or another suitable value), where an extension block of this type can indicate an association between range mapping parameters colors, target display characteristics, scene information, and active regions.
[00253] As an example, the ext_dm_data_block_payload () data structure can signal or indicate a number of associations between color gamut mapping parameters, target display characteristics, scene information (collectively, color volume transform parameters) and active regions.
[00254] As another example, for each association, the data structure ext_dm_data_block_payload () can include one or more values that indicate the number of blocks used to define the association. In this example, in some cases, the one or more values for each association is not flagged
Petition 870190048209, of 05/23/2019, p. 103/202
86/156 explicitly and can be set to be a preset value. Alternatively or additionally, a syntax element can be used to indicate the association mode. In such examples, for each mode, the one or more values can be inferred or the mode, or can be flagged. In such examples, for each mode, one or more blocks of particular value can be specified to be specified in the association. Alternatively or in addition, in some examples, for each association, a syntax element can signal the indices corresponding to the block that specifies the association. In such examples, the indices may correspond to the ext_dm_data_block_payload () index of the data structure as flagged in the SEI message. In such examples, the indices for a particular ext_block_level value may correspond to the ext_dm_data_block_payload syntax structure index of that particular ext_block_level value as flagged in the SEI message.
[00255] In some examples, explicit region information is sent along with each set of color gamut mapping parameters, scene information and target display characteristics.
[00256] In some examples, the scene information may include one or more syntax elements that indicate the minimum, maximum, average luminance information of the scene. Color range mapping parameters can include parameters of mapping functions used to do color range mapping. The target display characteristics can include display characteristics including minimum and maximum luminance, primary colors and a
Petition 870190048209, of 05/23/2019, p. 104/202
87/156 white dot on the display. The region information can include the coordinates that indicate a region (for example four
coordinates for a region rectangular) The which one one subset From parameters is applicable, one or more identifiers associated with region, and one or more parameters (describing shapes in the domain of coordinates of
colors or spatial domain) to further specify the sub-region of the region where the mapping should be applied.
[00257] In some examples, color range mapping parameters can be used to indicate all information in an ext_dm_data_block_payload () data structure that is not related to signaling regions (for example, color range mapping parameters , scene information and target display characteristics).
[00258] An eighth example, signaling additional syntax elements and modifying syntax structure can be performed to allow for the possibility of future extensibility using reserved values of ext_block_level. This can include syntax element signaling occupying a bit as many times as there are bits in ext_dm_data_block_payload for ext_block_level values that are reserved in the current version of the report.
[00259] In several examples, a decodifreader or receiver of an HDR video system can perform compliance checks on a bit stream. For example, the decoder or receiver can check whether restrictions and limitations, as described above, have been adhered to. The decoder can perform a
Petition 870190048209, of 05/23/2019, p. 105/202
88/156 compliance, for example, in line with the bit stream decoding or before you start to decode the bit stream. When a bit stream or part of a bit stream fails compliance check, the decoder can take several actions. For example, the decoder can ignore a data structure that fails a compliance check, and can proceed with decoding the bit stream after the data structure. As another example, the decoder can stop decoding the bit stream from the point at which the bit stream fails the compliance check. As an additional example, the decoder can reject the entire bit stream.
[00260] Several exemplary implementations of the methods described above will now be described. The following example implementations implement one or more of the examples described above. Exemplary implementations are illustrated using ATSC-defined syntax and semantics structures. Changes to the syntax and semantics structure are indicated as follows: [[text within double brackets]] indicates cancellations and underlined text indicates additions.
FIRST EXAMPLE [00261] Changes in syntax structures:
Table E.2: ext_dm_data_block () 1 __________________________________________________________________________________________________
I Week ienaSí i] I íWWelífeO) [êk Miloçk ^ o wl [i | ] uík)
Et eet drn block payloadí exí block lenuth [i]. |
^ C ^ block knel l U) ί
Petition 870190048209, of 05/23/2019, p. 106/202
89/156
Table E.3: ext dm_data_block_load (
ext dm data block ^ pavioad (ext block length, ext block took.11 ................................................ ....'............................................. .................................................. .........ext_block_len_bits - 8 * ext_b1ock_length ^ t_bjock_usc_bjte ^ 0 iff ext block took ““ 1 j { Descriptor i...............................1...............................-1 min PQ u (12) | max PQ ii (12) | avg PQ u (12) | ext block use bits 36______________________1______________________________________________________________________________________________________________________________________________ iff ext block level ~~ 2) flaruet max PQ 11 (12) | turn slope u (12) trim offset u (l2) | trimjwwertrim chroma weight na 2) 1ΰ (Ϊ2) ............... trim saturation gain u (12) | ms weight * (13) ext block use bits + = 85}_________iff ext block, level 5) {____________________________________ active area left offset u (h) 1 active area right offset u (I3) active area top offset u (l3) __active area bottom, offset________________________ ext block, use bits + = 52 u (l3) ______ | |iff ext block level == 6) fnum associations u (4): for (i == 0 i <num associations: i ++ ·) f num blks in assoc! i 1 j .................................................. ..... forri.; “. CL; .j ... 51nuni_bl.ksy.a_ãssgc [.iJ.,i ++) ________________________________________________________________________________________________________________________________________________________________________blk idx in assoc! i 1! i 1 y.i-íO iext block use bits + - = 8 *num blks in assoc! i 1 + 8ext block use bits + = 4JÍJ ................................................. .................................................. ........................................... ............................... j jffext block level ^{::: :::} 1 |] ext block level = = 2 11ext block level = = 5 II ext block level = = 6)while (ext ... block_useff> its + - + <ext block len bits)ext dm alignment zero bit ffn elsewlti 1 e (ext block use bi ts ++ - <ext block len bits)ext dm data bit n (1)
Petition 870190048209, of 05/23/2019, p. 107/202
90/156
SECOND EXAMPLE [00262] Changes in semantics:
[00263] Ext_block_length [i] is used to derive the payload size of the extended DM metadata block i-th in bytes. Ext_block_length [i] is not present if num_ext_blocks are equal to 0.
The ext_block_length value must be in the range 0 to 255, inclusive.
[00264] Alternatively, the syntax element is encoded as ext_block_length_minusl and the semantics are specified as follows:
[00265] Ext_block_length_minusl [i] plus 1 [[is used to derive]] specifies the payload size of the extended DM metadata block i-th in Bytes. [[Ext_block_length [i] is not present if an ext block is equal to 0. The value of ext_block_length_minusld must be in the range 0 to 255, inclusive.
Table E.4: Extended DM metadata block type definition
Nui i üp · d Li · .: D Ví ey.emliici i ú | Reserved [2 | Metadata Nt ¥ e3 2 - Passage of afiara [ί 1 | Reserved [i 4 | Reserved | S Metadata Nive 5 - Area afea | [6 [Meradadas Nh eS 6 - A ssodagao _í ϊ Qb] _j7 ... 255] Reservadoi [00266] num_associations specifies the number of associations specified in ext_dm_data_block_payload. The association block specifies the association between
Petition 870190048209, of 05/23/2019, p. 108/202
91/156 target display characteristics, color gamut mapping parameters and active regions associated with color gamut mapping parameters.
[00267] num_blocks_in_assoc [i] specifies the number of blocks that are specified in the i-th association. The value of num_block_in_assoc [i] must be in the range 0 to 255, inclusive.
[00268] blk_idx_in_assoc [i] [j] specific O index i-th block in the i-th Association. 0 value in blk_idx. _in_assoc [i] [ ] ] must be at range 0 The num_ext _b locks - 1, including. [00269] It is a requirement for conformity in
bit stream that for each block with index k that has an ext_block_level value of 1, 2 or 5, it must be at least one value of i in the ext_dm_data_block_payload_syntax structure with ext_block_level equal to 6 such that blk_idx_in_assoc [i] [i] is equal to k.
[00270] It is a requirement for bicurrent compliance that there should be no more than one syntax structure ext_dm_data_block_payload with ext_block_level equal to 6.
[00271] Ext_dm_data_bit can have any value. Their presence and value do not affect the decoder's compliance in profiles specified in this version of this
Report descriptive. Decoders that if conform The this see are this Specification will ignore all the elements of ext_dm_data_bit syntax. [00272] On An alternative, The syntax in
future-proof / byte alignment is specified as follows:
Petition 870190048209, of 05/23/2019, p. 109/202
92/156
THIRD EXAMPLE [00273] Changes to syntax elements:
Table E.l: ST2094-10_data ()
ST2O94-IÜ () f if (nuíR_extJtós) I fodJ ... 1 .P.í J
Petition 870190048209, of 05/23/2019, p. 110/202
93/156
min PQi u (12) max PQi u (12) i u (12) to exOtakJeyei ^{!! ss} ^! 1 ) (
uTtiie point x i u (I6) white point y i ui to) ht tn stope i 1.1 (1 »)
gjgjwg_________________________ trim chroma weight
tom saty, ration gain mü weight present Hay, __Uq>1 u (Dv.v.v.v.v.v .- ^ ill ms weight present flag ^{::: ::} ^: ! ) | ms weight......... /: / ::: /: ....... / :: / ::: /: 3: 3: / ...... .. :::: :: / .............: /: /:: /: ////:: /: /: / ::::: l ρ (Ϊ2) to ext jrdcíckjrvel 2} [activ ^^ rea ^ sÜ ^ ixeí i li (13)i ti (13)
active area top offset__________ active area bottom, oOW [00274] Changes in semantics:
Extended view mapping metadata block type definition
Petition 870190048209, of 05/23/2019, p. 111/202
94/156
] © xt btock lewl healthy edition of block type aaapeasrâfc d loetxkdosLevel Q Metadata - Coateudo StripLevel 1 Metadata - ChippingMetatfedos Level 2 - Active AreaA.TSC Reserved
[00275] target — min_PQ specifies the minimum luminance value of a target screen in 12-bit PQ encoding. The value must be in the range 0 to 4095, inclusive. If target_min_PQ is not present, it must be inferred to be equal to the value of the min PQ source. The target_min_PQ is the encoded value PQ of Luedetedsystepression, as defined in clause 10.4 of SMPTE ST 2094-lErro! Reference source not found. The 10-bit PQ encoding uses the most significant bits.
[00276] display_primaries_x [c] and display primaries y [c] specify the normalized chromaticity coordinates x and y, respectively, of the primary color component c of the master display in increments of 0.00002, according to the CIE 1931 definition of x and y as specified in ISO 11664-1 (see also ISO 11664-3 and CIE 15). To describe matrix displays using red, green and blue colors, it is suggested that the index value c equal to 0 must correspond to the green primary, c equal to 1 must correspond to the blue primary and c equal to 2 must correspond to the red colored primer (see also Annex E and Table E.3). The values for display_primaries_x [c] and display_primaries_y [c] c] will be in the range 0 to 50,000, inclusive.
[00277] white_point_x and white point y specifies the normalized coordinates of chromaticity x and y, respectively, of the white point of the master display in
Petition 870190048209, of 05/23/2019, p. 112/202
95/156 normalized increments of 0.00002, according to the CIE 1931 definition of x and y as specified in ISO 11664-1 (see also ISO 11664-3 and CIE 15). The values of the white point x and white_point_y must be in the range 0 to 50,000.
FOURTH EXAMPLE [00278] Changes in syntax elements and semantic steps. Syntax for matrices is placed in the form of a loop, as it is a more compact representation.
Table E.l: ST2094-10_type_data ()
type data () 1 Descriptor affected dm metadata idcurrent dm metadata id .MU .................. scene refresh UKÉV) YÇÇtoRQB çç-ef present ilasi Ml iff YCCtoRGB coef present flag ^::::::: 1HiotiiKK r'9. i-H-lYCC'toRGB eoeili] i lb) · Li (Nil) ((L L / Lll ·Y'CCtoRGB offset present fla ^ u (p ill YC € ioRGB..offset, present tlau ....... 1) [
Petition 870190048209, of 05/23/2019, p. 113/202
96/156 fort Uíl i <3, ί.Ή ') i
I.míUi
RGBéoLMS ceefpresen ihig
AXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX See ^ XXXXXXXXX See ^ XXXXXXXXXXXXXXXXV ^ XXXXXY ^ êA ...................
ill forfíMl s <9, ++) & t2086 info present fW aXXXXXXXXXXXXXXXXXXW * XXXXXXXXX> »M ^ X ^^ source min PQ ut goúrw _K Jigg (» l ____________________________________________________________________________________________________________ j u (l0)
ISíSL ^ LfeÍÊSlã ____________________________________________________________________________________________________________ LbéyI ifí in ext. blocks) -i ^ h.y.Çf .J1 V.LÇ: ... ^. ^^ ()) i ÍGLêlí ^ D.! B.tD.L . £ ^ bjt L.ílD.
fo f i - 0; i 'mm ext blocks, i · »+) [
Table E.3: ext_dm_data_block_payload ()
Petition 870190048209, of 05/23/2019, p. 114/202
97/156
ext dm data block pavtoadfexi block. lenMkext block level JllOlOiiim ^ I described plot ext block len bits - 8 * exr block lengthexMdockj.ise ^ bhs- 0
there bh> ck kvel - I) {
mln PQ ud2) max PQavg PQ ...................ymi ................. ex block u.se bits + - 36if £ extHdoeijevel --- 2 j {target max PQ , u £ I2f ................u (12) ............... |disnlav primaries x [clv .-. 'teeeeeeettiMeeeMceettMcmmiKmKmmmtttceeeMeceMKv.v.v.v.v.v.v.v.v.v.v. ^^^display primaries vid ..ioa ................_U (hS)u {! 6) .................................................. .................................................. ...............................trim slope .M.U) ................ trim offset ..y £ U) ................. trim power uf 12)
trim chroma weight JO21 ................ him sain rati on gainms weight JOilclQl ext block use bits -r- 85: 7: ³ V ^{3 3 3 3: ^7::} ^{^7:} ^{7 7 7 7 7:} /: ⁷ ^7: 1: + 11: +: ^{7 7:: 7:::} + 1 ^{7: 7} 7 i <<: 1: Hl ...................... j ............... .................................................. .................................................. .................................it! ext block level --- *) 1aciw ... am ... | eft ..: ofite 1MOF111- + 11 active area right tAet l: ^^ |> ^: ^ li711 + li: active area tog cÃei <h) active area bottom offset ..Sill) ................) ext block use bits + - 52:: /: ^ :;: <77: 7: 7 | I ·. ·: ·. ·. ·. : ·. ·. ·. :;. ·.;. · .. γ;::;: · .γ ·;: · ^ γ · <·: ·: ···:: γ.γ. <g / gg ^ .g · ·; . ext dm arimwm bit .lluMai J ................................................. .................................................. .................................................. ..................
Petition 870190048209, of 05/23/2019, p. 115/202
98/156
Table E.4: Extended DM metadata block type definition
í gxt biockjeveli 0 Type of bk. <The metodata DM thisodido__________________________]Prohibited i Ϊi 3 Level 1 Metadata - Range, of coíitesdo |Level 2 Metadata - Chip passing [Protliido i i 4 Açjfejç · .. IMetadata Nh’el 5 - Area adva! ! 6 ... 255 Protected ί
[00279] target — min_PQ specifies the minimum luminance value of a target screen in 12-bit PQ encoding. The value must be in the range 0 to 4095, inclusive. If target_min_PQ is not present, it must be inferred to be equal to the value of the min PQ source. The target_min_PQ is the encoded PQ value of TargetedSYstemDUsplayMinimumLuminance as defined in clause 10.4 of SMPTE ST 2094-1 range 241. 10-Bit PQ that encodes the most significant bits.
[00280] display_primaries_x [c] and display primaries yFcl specify the normalized coordinates of chromaticity x and y, respectively, of the primary color component c of the master display in increments of 0.00002, according to the CIE 1931 definition of x and y as specified in ISO 11664-1 (see also ISO 11664-3 and CIE 15). Methods for describing matrix displays using red, green and blue colors, it is suggested that the index value c equal to 0 must correspond to the green primary, c equal to 1 must correspond to the blue primary and c equal to 2 must correspond to the red primer (see also Annex E and Table E.3). The values for display_primaries_x [c] and display_primaries_y [c] c] will be
Petition 870190048209, of 05/23/2019, p. 116/202
99/156 in the range of 0 to 50,000 inclusive.
specify the normalized x-chromaticity coordinates respectively, of the master's white point in normalized increments of 0.00002 according to the CIE 1931 definition of x as specified in ISO
11664-1 (see also ISO
11664-3 and CIE 15). The white point x and white point v values must be in the range of a 50,000.
FIFTH
EXAMPLE [00282]
Changes in semantic structures:
Table
E.2: ext_dm_data_block ()
Petition 870190048209, of 05/23/2019, p. 117/202
100/156
Table E.3: ext_dm_data_block_load () cst datx block payloads ext ... block Joigth. m Nnck level
c.xl block ien bhs 3 * ext block lenglh ext block use bhs - 0 erO ^ blockJevel ~~ I) [ntopJPQ m & xjR) ZZZZZíaJílZZZZZZZZZZZZ cxtjúockja ^ bits 36 a.
iH ΛΙ- bl ock. IwM ^iK ™ ^: 2) 1 target ntax PQ inrri offset tone power to rn__sMu rati opjga I «ros ^ waghr Jgjpck_t _: beJ> [is
X
J ................................................. ..................................
iR ext block level; I acti vejtrea Jeft_pfikst i U-Í13)
------------------ i ------------- i .------------ i .-- ------ Petition 870190048209, of 23/05/2019, p. 118/202
101/156
[00283] ext — block_length [i] is used to derive the payload size of the extended DM metadata block i-th in bytes. Ext_block_length [i] is not present if num_ext_blocks are equal to 0.
The ext_block_length value must be in the range 0 to 255, inclusive.
[00284] Alternatively, the syntax element is encoded as ext_block_legth_minusl and the semantics are specified as follows:
[00285] ext_block_length_minusl [i] plus 1 [[is
Petition 870190048209, of 05/23/2019, p. 119/202
102/156 used to derive] specifies the payload size of the extended DM metadata block i-th in Bytes. [[Ext_block_length [i] is not present if num_ext_block is equal to 0. The value of ext_block_length_minusld must be in the range 0 to 255, inclusive.
Table E.4: Extended DM metadata block type definition
Metadata block type · These DMs [Reserved [Level 1 Metadata - Content range | í V Level Metadata. 2 - Trim pass [ jji / / ú ú 7 Y dl · /:: U it it ^{3 3} yúúú iiii HN Reserved [ | 4 Resen- ^st ed | 5 Level Metadata. 5 - Active area [ ΐ ú Level 6 Metadata - Association [Reserved [
[00286] num_associations specifies the number of associations specified for the ext_dm_data_block_load message in the SEI message. The association block specifies the association between scene information, target display characteristics, color gamut mapping parameters, and active regions associated with the color gamut mapping parameters.
[00287] num_blocks_in_assoc [i] specifies the number of blocks that £ association. The value of num_ range from 0 to 255, including [00288] blk_idx_ index of block i-th in blk_idx_in_assoc [i] []] must num_blocks-l, inclusive.
[00289] It is one; not specified in the i-th block_in_assoc [i] must be in the _in_assoc [i] [j] specifies the i-th association. The value to be used in the range 0 to requirement for compliance
Petition 870190048209, of 05/23/2019, p. 120/202
103/156 bicurrent that for each block with index k such that it has ext_block_level [k 1 has a value equal to l, 2 or 5, there must be at least one value of i such that the syntax structure ext_dm_data_block_payload (i) with equal ext_block_level to 6 such that blk_idx_in_assoc [i] [j] is equal to k.
[00290] It is a requirement for bicurrent compliance that there should not be more than one ext_dm_data_block_payload_syntax structure in the SEI message with ext_block_level equal to 6.
[00291] ext_dm_data_bit can have any value. Their presence and value do not affect the decoder's compliance in profiles specified in this version of this
Descriptive report. Decoders that conform to this version of this Specification will ignore all ext_dm_data_bit syntax elements.
[00292] In an alternative, the byte alignment / future proofing syntax is specified as follows:
[00293] The following is the text of ATSC Candidate
Standard: A / 341 amendments: 2094-10, which includes amendments to O
Petition 870190048209, of 05/23/2019, p. 121/202
104/156 ATSC document Number S34-262 r3. This amendment includes some of the methods described above.
[00294] 1. OVERVIEW [00295] This document describes the encoding and transport in the ATSC emitted bit stream elements of ST 2094-10 dynamic metadata for Color Volume Transformation-Application # 1 which is a technology for the use of dynamic metadata for HDR content. If approved by ATSC, a / 341: 2017, Video-HEVC, (a / 341) would be amended according to the editions described here.
[00296] 2. REFERENCES [00297] The following references would be added to A / 341 [00298] 2.1 Normative References [00299] [1] SMPTE: Dynamic Metadata for
Color Volume Transformation-Application # 1, Doc. ST 2094-10 (2016), Society Of Motion Picture And Television Engineer, White Plains, NY.
[00300] 2.2 Informative references [00301] [2] SMPTE: Dynamic Metadata For Color-Core Volume Transformation Components, Doc. ST 2094-1 (2016), Society Of Motion Picture And Television Engineers, White Plains, NY.
[00302] 3. DEFINITION OF TERMS [00303] No new acronyms, abbreviations or terms would be added AA / 341 [00304] 4. CHANGES TO A / 341 [00305] In this section of this document, [ref] indicates that a cross reference to a referenced referenced document that is listed in a / 341 would be
Petition 870190048209, of 05/23/2019, p. 122/202
105/156 inserted (or as otherwise described within the square brackets). An actual cross-reference to a referenced document listed in this document would be updated with the reference number of the references
recently added that would be i incorporated into one document a / 341. [00306] Additional a bullet a 6.3.2.2 [00307] Add O item bullet below gives list of bullet found in 6 , 3.2.2 Characteristics In PQ transfer: [00308] . 0 flow in bits can contain posts I know with value of payer equal to 4. this allows The
Optional transmission of the ST 2094-10 metadata message described in [ref for new subsection described below].
[00309] A new subsection according to the section
6.3.2.2 [00310] Add the text below a / 341 as a new subsection under Section 6.3.2.2 PQ Transfer Characteristics. The new subsection is entitled Section 6.2.2.2.x encoding and transporting the SMPTE ST 2094-10 Metadata message. (For readability, the following text is not shown in markup.) [00311] Encoding and Transporting SMPTE ST 2094-10 Metadata Message [00312] The F-EVC video bit stream may contain the 2094-10 metadata message in order to provide dynamic information about the FIDR video signal. When a 2094-10 metadata message is present, this information can be used by the display to adapt the FIDR images delivered to the capacity of the
Petition 870190048209, of 05/23/2019, p. 123/202
106/156 display. In addition, this metadata can be used to derive an SDR (ITU-R BT.709 [ref]) image by receiving devices, such as an ATSC 3.0 converter / converter The information carried in the 2094-10 metadata message defined in [ ref for new Appendix described below] provides a cart for metadata elements defined in ST 2094-1 [2] and ST 2094-10 [1] [00313] Metadata 2094-10, when present, must be encoded and transported as data registered by an ITU-T Recommendation. 35 Supplementary Information for Expansion (SEI) message through the ATSCl_data () structure defined in Table 14 of ANSI / SCTE 128-1 [ref] and the value assigned to the user data type code is shown in [ref for Table xx].
Table x.x user_data_type_code
iiser daia type <’í> de [user ..data .type .. structure 0x0 « i ST2094-10 dà-ZíM
[00314] Syntax and semantics for payload ST2094-10_dados () must be as specified in [ref for new Annex described below] clause [ref for new Annex, Section 1 described below]. Where the type of the corresponding External unit must be set equal to PREFIX SEI NUT.
[00315] If a 2094-10 metadata message is present, the following restrictions should apply:
[00316]. Metadata message 2094-10 must be associated with each bit stream access unit. If this message is present, it will only be present once per access unit.
[00317]. application version must be
Petition 870190048209, of 05/23/2019, p. 124/202
107/156 established equal to 0 [00318]. Matrix Display Color Volume SEI messages (Containing SMPTE ST 2086 Static metadata [ref]) must be present in the bit stream.
[00319]. The number of extension blocks with ext_block_level equal to 1 must be restricted to equal 1.
[00320]. The number of extension blocks with a
ext. _block_ I took equal The 2 must to be restricted to less than what or equal to 16[00321] 0 number in blocks of extension with ext. _block_ I took equal The 5 must to be limited to be equal to 0
or 1.
[00322] Adding a new Annex to A / 341 [00323] Add the text below as a new Annex to a / 341. The Annex is entitled Metadata Based on SMPTE ST2094-10_Data. (For readability, the following text is not shown in markup.) [00324] A.l METADATA BASED ON DATA ST 2094-10 (NORMOTIVE) [00325] This clause specifies the syntax and semantics of ST2094-10_data ().
[00326] The syntax for ST2094-10_data () is shown in Table Y.Y, Table Z.Z and table M.M.
[00327] Process of parsing each syntax element by the descriptors f (n), i (n), u and (v) and u (n) is described in HEVC [ref].
[00328] Note: metadata elements are defined according to SMPTE standards ST2086 [ref], ST 2094-1 [2], or ST 2094-10 [1]. The conversion between
Petition 870190048209, of 05/23/2019, p. 125/202
108/156 luminescence and 12-bit PQ values can be found in the attached ST 2084.
Table Y.Y ST2094-10_data () i ST2044-10 d a íá () f s ÍVescripwr | JdsatiÍÈ® ’app | we (y) | | »Pp„ woim | uei v js metadata ^ refresh flag j · ι (1 st i if {metadatarefresh..Hag}; [
I wmj ^ tjbhcks | w (v) | ] Í0 earn sxt Hochs) (| i whilst: byte, allgiisdi i1 [
dm | ft Uΐ
I fbr (I «ft; i <.. Hocks; i f +) {|
I II
I whHe ( ^s byte_ aligned /})
I dm „, ali ^ weat„ w0, W> / V / / J / lii / lt // i V // ii j /// i // j /: [<i if / 1} Qt t; c tt: Htitt f fl '
Petition 870190048209, of 05/23/2019, p. 126/202
109/156
Table Z.Z ext_dm_data_block_payload ()
ru dm mmriatu blackyi) { Descriptor i ®tt_bl ( Ekjle8. ^ H [I] urf’.i i sxt block mvd (s j j ext jlm..dâlâ ... b.kxêkblock Jteogthí i], eM ... block.Jwd [.ilLiluLil / lllJlIi ill HUH / 11 / i / 1 J11 111111111111111 LllLllllJlllllll Lilli Lil) 17.:.1.:.1.111/7.:.1. ll .Ulyuy::: y ^: ^:: y 7: 1 <> 1: J: yJ / · /! : y y <: y:: J / yyy:::: yy: f 1 ^: :: 1 1 1 1 1 1 1::::: 1 1 1 11 11 11: 1 11:: 1 1 1111 1:: ::: 1 1 1 1111111: :: ^1: 1 1: ^1: 1Table N.N ext_dm_data_block_payload ()ϊ ext dm dma block psiylosdf ext block kn ^ lh _t ext JbliM'k levd) ξ Descriptor i
y * i block levs bits ·· § * block icmpti mt blsK.'k use bits - Ui if (sxijblsiekjevel ^{: RSK} I) (| mhi 1'0 juü2) max PQ j 5./(12) avg PQ i 5. / (12) ext block w bi Is / ”'.16i ifi (wt block level 2) f | max PQ 1 /./(12) trim slope 1cí12) trim s: sflw j 5 / () 2) tri m power i 1 / (12)
Petition 870190048209, of 05/23/2019, p. 127/202
110/156
trim smurnem gain5 M. weight i (13)ex · b > vk Lise + 85 <12) : if (ext ... blwk fevd S) · (active ..areajeô ... ofiW <H) sedxe íiu.ii fitlseí a (l 5} active. : .i (Í5! here they see. bottom ,, offset o (Uj ex- block i.vse bits - - 52 _; t ^- <1 ................................... </ 2 /: - _//::: : / :: - _- - _-::: / 11/1 1 1 --- 11: 1: 1 whílteÇ ®at "eek b, EAE, Ws · ^ <ext ..block Jen JHts)!ext .. _: to ... âl..ismy.bit ti 1.)> /] 5 /// [ ₃₃ ] 1 [/ /] 51/3/2/3] //// IB] |
[00329] This clause defines the semantics for
ST2094-10_data ().
[00330] To the purposes gives gift clause, the following functions mathematical apply it up: r) _ {x: x> i)X <0 [00331] Floor (x) It's the biggest number all
less than or equal to x
Petition 870190048209, of 05/23/2019, p. 128/202
111/156
C7i / s3 (r. Y. ~) I y; x> V
G ;: health
AWnúCu ~ 5's $ nU ') * -H <xn (4 & 5 (x) 4 0.5} [00332] / = Integer division with truncation of the result towards zero, eg 7/4 and -7/4 are truncated UNT 1 and -7/4 and 7/4 are truncated to -1.
[00333] The accuracy of the information carried in that SEI message is intended to be suitable for purposes corresponding to the use of SMPTE ST 2094-10 [1].
[00334] app_identifier identifies an application and is set to 1 according to restrictions in section 5 of ST 2094-10 [1].
[00335] app_version specifies the application version in the application and is set to 0.
[00336] metadata_refresh_flag when set to 1 cancels the persistence of any previous extended display mapping metadata in exit order and indicates that the extended display mapping metadata follows. The extended display mapping metadata persists from the encoded image for which the SEI message containing ST2094-10_data () is associated (inclusive) with the encoded image to which the next SEI message containing ST2094-10_data () and with a set of metadados_renovation_flag equal to 1 in order of output is associated (exclusive) or (otherwise) with the last image
Petition 870190048209, of 05/23/2019, p. 129/202
112/156 in the CVS (inclusive). When set to 0, this indicator indicates that the extended view mapping metadata does not follow.
[00337] num_ext_blocks specifies the number of extended display mapping metadata blocks. The value must be in the range 1 to 254, inclusive.
[00338] dm_alignment_zero_bit must be equal to 0.
[00339] ext_block_length [i] is used to derive the payload size of the extended display mapping metadata block i-th in bytes. The value must be in the range 0 to 1023, inclusive.
[00340] ext_block_nivel [i] specifies the payload level contained in the i-th extended display mapping metadata block. The value must be in the range 0 to 255, inclusive. The types of corresponding extended display mapping metadata blocks are defined in Table E14 values of ext_block_level [i] that are reserved ATSC should not be present in the bit streams that conform to this version of the ATSC specification. Blocks using ATSC reserved values should be ignored.
[00341] When O value in ext_block_nivel [i] is established equal to 1, O value in ext_block_length [i] can be adjusted i like 5.[00342] When O value in ext_block_nivel [i] is established equal to 2, O value in ext_block_length [i] must be established equal to 11. [00343] When O value in ext_block_nivel [i] is established equal to 5, O value in ext_block_length [i]
Petition 870190048209, of 05/23/2019, p. 130/202
113/156 must be set to 7.
Table M.M Definition of loco type of extended display mapping metadata j ext NtK'k level j Displayed'extended m ^ pe / y type of block of raetaíládos θ I ATSC Reserved | | | Nivd 1 Metarfedos - Coateode band | 4 | Level 2 Metadata - Chip Passage]} -. ·. * ... ·. . ·. ·. ·. . ·. . ·. . ·. . ·. ........................ ·. . ·. . ·. . ·. . ·. ·. ·. . ·. . ·. . · ......... ·. . · ..... '..'. . '..'. . '..'. ..... <
I 1 | ATSC Reserved | ATSC Reserved | | 5 | Level 5 Metadata - Active Area [ô Ϊ55 I ”ATSCR« grass <fc> I [00344] When an extended display mapping metadata block with ext_block_level equal to 5 is present, the following restrictions must apply:
[00345]. Extended display mapping metadata block with ext_block_level equal to 5 must be preceded by at least one extended display mapping metadata block with ext_block_level equal to 1 or 2 [00346]. Between any two extended display mapping metadata blocks with ext_block_level equal to 5, there must be at least one extended display mapping metadata block with ext_block_level equal to 1 or 2.
[00347]. Extended view mapping metadata block with ext_block_level equal to 1 or 2
Petition 870190048209, of 05/23/2019, p. 131/202
114/156 will be present after the last extended view mapping metadata block with ext_block_level equal to 5.
[00348]. The metadata for an extended display mapping metadata block with ext_block_level equal to 1 or 2 must be applied to the active area specified by the first extended display mapping metadata block with ext_block_level equal to 5 following this block.
[00349]. When the active area defined by the current extended display mapping metadata block with ext_block_level equal to 5 overlaps with the active area defined by previous extended display mapping metadata blocks with ext_block_level equal to 5, all metadata of the extended display mapping metadata with ext_block_level equal to 1 or 2 associated with the current extended display mapping metadata block with ext_block_level equal to 5 must be applied to the pixel values of the overlay area.
[00350] min_PQ specifies the minimum luminance value of the current image in 12-bit PQ encoding. The value must be in the range 0 to 4095, inclusive. Note that the 12 bit min PQ value is calculated as follows:
min_PQ = Clip3 (0, 4095, Round (Min * 4095)) [00351] Where Min is MinimumPqencodeMaxrgb as defined in clause 6.1.3 Of SMPTE ST 2094-10 [1] [00352] max_PQ specifies the maximum luminance value of the image current in 12-bit PQ encoding. The value must be in the range 0 to 4095, inclusive. note that
Petition 870190048209, of 05/23/2019, p. 132/202
115/156 the 12 bit max PQ value is calculated as follows:
max_PQ = Clip3 (0, 4095, Round (Max * 4095)) [00353] Where Max is MaximumPaencodedMaxrgb as defined in clause 6.1.5 of SMPTE ST 2094-10 [1].
[00354] avg_PQ specifies the average PQ code value for the luminance of the image in 12-bit PQ encoding. The value must be in the range 0 to 4095, inclusive. Note that the PQ 12-bit average value is calculated as follows:
avg_PQ = Clip3 (0, 4095, Round (average * 4095)) [00355] where Avg is AveragePqencodedMaxrgb as defined in section 6.1.4 of SMPTE ST 2094-10 [1].
[00356] target — max_PQ specifies the maximum luminance value of a target display in 12-bit PQ encoding. The value must be in the range 0 to 4095, inclusive. The target_max_PQ value is the TargetedSYstemDisplayMaximumLuminance PQ encoded value, as defined in clause 10.4 of SMPTE ST 2094-1 [2].
[00357] Note: This SEI message does not support the targeting of TargetedSystemDisplayPrimaries, TargetedSYstemDisplayWhitePointChromaticity, and TargetedSystemDisplayMinimumLuminance which are specified as mandatory in ST 2094-10 [1].
[00358] If there is more than one extended display mapping metadata block with ext_block_level equal to 2, these blocks will not have any duplicate max PQ targets.
[00359] trim_slope specifies the slope metadata. The value must be in the range 0 to 4095, inclusive. If the slope_leaning is not present,
Petition 870190048209, of 05/23/2019, p. 133/202
116/156
must infer that the value in slope of 12 bits is calculated as follows: trim_slope = Clip3 (0, 4 0 95, Round ((S S - -0.5) * 4096)) [00360] On that S it's the Tint gain as defined In Clause 6. 2.3 of SMPTE ST 2094-10 [1].
[00361] trim_offset specifies the offset metadata. The value must be in the range 0 to 4095, inclusive. If the offset_offset is not present, it must be inferred that the 12-bit offset value is calculated as follows:
trim_offset = Clip3 (0.4095, Round ((00 +0.5) * 4096)) [00362] in the case where O IS The Tone offset as defined in Clause 6.2.2 of SMPTE ST 2094-10 1 [00363 ] trim_power specifies the power metadata. The value must be in the range 0 to 4095, inclusive. If the compensation power is not present, it must be inferred that the 12-bit power value is calculated as follows:
trlm_power = Clip3 (0.4095, Round ((PP-0.5) *
4096)) [00364] where P is Toneminggama as defined in clause 6.2.4 Of SMPTE ST 2094-10 [1].
[00365] trim_croma_weight specifies the chroma weight metadata. The value must be in the range 0 to 4095, inclusive. If the compensation chroma weight is not present, it must be inferred that the chroma weighting value of 12 bits is calculated as follows:
trlm_croma_welght = Clip3 (0, 4095, Round ((ACPC
Petition 870190048209, of 05/23/2019, p. 134/202
117/156 +0.5) * 4096)) [00366] Where CW is the weight of Cromus by weight as defined in Clause 6.3.1 of SMPTE ST 2094-10 [1].
[00367] trim_saturation_gain specifies the saturation gain metadata. The value must be in the range 0 to 4095, inclusive. If the offset saturation gain is not present, it must be inferred that the 12 bit saturation gain value is calculated as follows:
trim_saturation_gain = Clip3 (0, 4095,
Round ((SSSS + 0.5) * 4096)) [00368] Where SG is the SaturationGain as defined in clause 6.3.2 Of SMPTE ST 2094-10 [1].
[00369] ms_weight this field is reserved for future specification. This 13-bit signed integer must be Oxlfff (- 1).
[00370] active_area_left_offset, active_area_right_offset, active_area_top_offset, active_area_bottom_offset which specifies the selected pixels of the current image, in terms of a rectangular region specified in image coordinates for the active area. Values must be in the range 0 to 8191, inclusive. See also Processingwindow of ST 2094-10 [1].
[00371] Active_area_left_offset, active_area_right_offset, active_area_top_offset, the bottom offset of the active area represents the coordinates of the Left Corner and the Lower restricted corner in clause 7.1 of ST 2094-10 [1] as follows:
Petition 870190048209, of 05/23/2019, p. 135/202
118/156
Upped, .ehComer ίactactive_area_tnp_offKet)
LawerR.i ^ MCbiner - (XSize - I - actm · area ri ^ hi offset, YSize -1 active area brUtvra trf'fset)
Top of active area compensation = (left shift [0372] where Xsize is the horizontal resolution of the current image and Ysize is the vertical resolution of the current image [00373] ext_dm_alignment_zero_bit must be equal to 0.
[00374] Figure 7 is an example of a process 700 for processing video data. Process 700 can be performed by a video encoding system, such as the video encoding system of Figure 1, which is the ST 2094-10 implementation.
[00375] In step 702, process 700 of Figure 7 includes receiving the video data, wherein the video data includes at least two video signals. At least two video signals can be related or unrelated and / or they can be the same or different. For example, each video signal may have been captured by a different camera.
[00376] In step 704, process 700 includes obtaining one or more sets of color volume transformation parameters from the video data. As discussed above, color volume transformation parameters can include a transfer function, as well as variables and constants related to the transfer function. In various implementations, the transfer function, variables, and constants can be used
Petition 870190048209, of 05/23/2019, p. 136/202
119/156 to compress a color volume into a smaller dynamic range.
[00377] In step 706, process 700 includes determining a display region for each of at least two video signals, where the display regions determine a portion of a video frame in which the video signals will be displayed. In some cases, the display regions can be adjacent. In some cases, the display regions may overlap. In some cases, as with image in image, a display region may overlap with another display region.
[00378] In step 708, process 700 includes determining, for each of at least two video signals, a respective association between a video signal between at least two video signals and a set of volume transformation parameters color between one or more sets of color volume transformation parameters, where sets of color volume transformation parameters determine one or more display parameters for display regions for video signals. For example, a set of color volume transformation parameters can be used to modify the particular video signal with which the set of color volume transformation parameters is associated. In this example, the set of color volume transformation parameters can be used to compress the dynamic range signal of the video signal into a range that can be displayed by a particular display device.
[00379] [In step 710, process 700 includes the generation of one or more metadata for one or more sets
Petition 870190048209, of 05/23/2019, p. 137/202
120/156 of color volume transform parameters. Metadata blocks can be encoded, for example, in one or more external SEI units.
[00380] In step 712, process 700 includes generating an encoded bit stream for video data, wherein the encoded bit stream includes one or more blocks of metadata. The encoded bit stream can be generated, for example, using the AVC or HEVC standard, or another video encoding standard.
[00381] In step 714, process 700 includes encoding, in the encoded bit stream, the respective associations determined between at least two video signals and one or more sets of color volume parameters. In some implementations, the association coding may include placing one or more blocks of metadata in the encoded bit stream according to an order of the display regions within the video frame. For example, one or more metadata blocks that contain the set of color volume transformation parameters for the first (in order of tracking) the display region can be placed in the encoded bit stream, then the metadata blocks contain the set of color volume transformation parameters for the second (in scan order) the display region can be placed in the following encoded bit stream, and so on.
[00382] In some implementations, the encoding of the associations determined between the at least two video signals and the one or more sets of color volume parameters may include the insertion of one or more values in the encoded bit stream which include,
Petition 870190048209, of 05/23/2019, p. 138/202
121/156 each, the associations determined. For example, a data structure can be encoded in the bit stream, where the data structure indicates an association between a set of color volume parameters encoded in a specific set of metadata blocks and a display region.
[00383] In some cases, a first display region for a first video signal from at least two video signals overlaps a second display region for a second video signal from at least two video signals. In such cases, a set of color volume transformation parameters from one or more sets of color volume transformation parameters for use in the overlapping region is determined by a priority between the first display region and the second display region. . In some examples, priority is based on an order in which the first display region and the second display region are displayed on the video frame. In some examples, the priority is based on a value provided by the video data. For example, a priority value can be encoded in the bit stream with each display region.
[00384] Figure 8 is an example of a process 800 for processing video data. Process 800 can be implemented by a video encoding system that implements ST 2094-10.
[00385] In step 802, process 800 includes receiving an encoded bit stream, wherein the encoded bit stream includes at least two video signals
Petition 870190048209, of 05/23/2019, p. 139/202
122/156 encoded and one or more metadata blocks that include one or more sets of color volume transform parameters. At least two video signals can be related or unrelated and / or they can be the same or different. For example, each video signal may have been captured by a different camera. Sets of color volume transformation parameters may include, for example, a transfer function and variables and / or constants related to the transfer function. One or more blocks of metadata can, for example, be encoded in one or more external SEI units in the encoded bit stream.
[00386] In step 804, process 800 includes determining a display region for each of the two at least encoded video signals. Each display region can correspond to an area of a display device screen (for example, monitor, smart phone screen, tablet screen, etc.). Each video signal can be displayed in individual (or possibly multiple, individual) display regions.
[00387] In step 806, process 800 includes determining, for each of at least two video signals, an association between a video signal between at least two video signals and a set of color volume transformation parameters between the set of color volume transformation parameters.
[00388] In step 808, process 800 includes decoding at least two encoded video signals using a respective associated set of color volume transform parameters, wherein the
Petition 870190048209, of 05/23/2019, p. 140/202
123/156 respective associated set of color volume transformation parameters determine one or more display parameters for a corresponding display region. For example, a set of color volume transformation parameters can be used to compress the dynamic range of a video signal into a range that can be displayed by a particular display device.
[00389] In some implementations, associations between at least two video signals and the one or more set of color volume transformation parameters are based on an order of the display regions. For example, a set of color volume transformation parameters that appears first in the encoded bit stream can be associated with a first display region (in order of scanning).
[00390] In some implementations, associations between at least two video signals and one or more sets of color volume transformation parameters based on one or more values included in the encoded bit stream. For example, a data structure can be encoded in the bit stream, where the data structure includes values that associate a set of color volume transformation parameters with a particular display region.
[00391] In some cases, a first display region for a first video signal from at least two video signals overlaps a second display region for a second video signal from at least two video signals. In such cases, a
Petition 870190048209, of 05/23/2019, p. 141/202
124/156 set of color volume transformation parameters from one or more sets of color volume transformation parameters for use in the overlapping region is determined by a priority between the first display region and the second display region. For example, the priority can be based on an order in which the first display region and the second display region appear on the video frame. As another example, the priority can be based on a value provided by the video data.
[00392] Figure 9 is an example of a 900 process for processing video data. The 900 process can be implemented by a video encoding system that includes ST 2094-10.
[00393] In step 902, process 900 includes receiving the video data, in which the video data is associated with a color volume. As discussed above, a color volume can include at least one dynamic range and one color range, which is the depth and range of colors captured in the video data.
[00394] In step 904, process 900 includes obtaining a set of color volume transformation parameters from the video data, where the color volume transformation parameter set can be used to transform the volume by heart. For example, the set of color volume transformation parameters can include a transfer function, variables and constants. As an additional example, the color volume transform parameters can be used to compress the dynamic range of the color volume
Petition 870190048209, of 05/23/2019, p. 142/202
125/156 for a track that can be displayed by a particular display device.
[00395] In step 906, process 900 includes obtaining a set of master display color volume parameters, in which the set of master display color volume values includes values determined when generating a master copy of video data. The matrix display color volume parameters may reflect, for example, the depth and range of colors that the person made with the desired video data. In some examples, it is desirable that any copies of the video data are as close as possible, being displayed using the depth and range of colors captured by the color volume parameters of the master display.
[00396] In step 908, process 900 includes the generation of one or more blocks of metadata for the set of color volume transform parameters. The one or more blocks of metadata can be encoded, for example, in one or more external SEI units.
[00397] In step 910, process 900 includes the generation of one or more additional metadata blocks for the master display color volume parameter set. In several examples, the metadata blocks for the master display color volume parameters can also be encoded in external SEI units.
[00398] In step 912, process 900 includes the generation of an encoded bit stream for the video data, wherein the encoded bit stream includes one or more metadata blocks and one or more additional metadata blocks, where the inclusion of one or more blocks of
Petition 870190048209, of 05/23/2019, p. 143/202
126/156 additional metadata is required by the presence of one or more blocks of metadata in the encoded bit stream.
[00399] In some examples, the set of color volume transformation parameters includes a transfer characteristic, and in which, in the encoded bit stream, the one or more blocks of metadata are excluded when the transfer characteristic does not match a particular value. For example, the transfer characteristic has a value of 16 when the ST 2084 transfer function is included in the color volume transform parameter set, and has a value of 18 when the HPG transfer function is included. In these examples, one or more blocks of metadata are not included in the bit stream according to the present invention, the transfer function does not have a value of 16 or 18.
[00400] In some examples, the set of color volume transformation parameters and the set of color volume parameters of the master display include the same field. In these examples, the field is omitted from one or more metadata blocks for the LT-based color volume transform parameter set in the field being present in one or more additional metadata blocks for the display color volume parameters master.
[00401] In some examples, the video data includes a plurality of processing windows. In some implementations, in the encoded bit stream, an amount of the plurality of processing windows is restricted to a value between one and sixteen. This restriction
Petition 870190048209, of 05/23/2019, p. 144/202
127/156 has regulated expectations for decoders, so that decoders can expect no more than sixteen processing windows in an encoded bit stream. Similarly, in some examples, the video data includes a plurality of content description elements, and in which, in the encoded bit stream, an amount of the plurality of content description elements is restricted to one. In some examples, the video data includes a plurality of target display elements, and in which, in the encoded bit stream, an amount of the plurality of target display elements is restricted to a value between one and sixteen. These restrictions can limit the range of options that a decoder is expected to be able to handle [00402] In some examples, an encoded bit stream may include at least one metadata block for each access unit in the encoded bit stream, block metadata including color volume transformation parameters. That is, for each access unit, the encoded bit stream will include at least one metadata block that includes color volume transform parameters.
[00403] In some examples, values defined as reserved are excluded from the encoded bit stream. For example, values reserved for an ext_block_level field in a metadata block (where the metadata block includes color volume transform parameters) can be excluded from an encoded bit stream.
[00404] In some implementations, one or more blocks of metadata for the transform parameters of
Petition 870190048209, of 05/23/2019, p. 145/202
128/156 color volume each includes a length value. In some examples, in the encoded bit stream, the length value is restricted to a multiple of eight. In some examples, the length value is restricted to a value between 0 and 255.
[00405] Figure 10 is an example of a process 1000 for processing video data. Process 1000 can be implemented by a video encoding system that implements ST 2094-10.
[00406] In step 1002, process 1000 includes receiving an encoded bit stream, wherein the encoded bit stream includes one or more blocks of metadata that include a set of encoded color volume transform parameters. Color volume parameters can be used to reduce the dynamic range of video data included in the encoded bit stream, so that the video data can be displayed by a particular display device. In some examples, the metadata blocks are encoded in one or more external SEI units in the encoded bit stream.
[00407] In step 1004, process 1000 includes determining the presence of one or more blocks of metadata in the encoded bit stream [00408] In step 1006, process 1000 includes, based on determining the presence of one or more blocks of metadata in the encoded bit stream, determine that a presence of one or more additional blocks is required in the encoded bit stream.
[00409] In step 1008, process 1000 includes determining that the encoded bit stream does not include the one
Petition 870190048209, of 05/23/2019, p. 146/202
129/156 or more further metadata blocks that include an encoded set of master display color volume parameters. In some implementations, the presence, in the encoded bit stream, of metadata blocks that include the set of encoded color volume transformation parameters means that the metadata blocks that include the master display color volume parameters must also be present in the encoded bit stream. The additional metadata blocks could otherwise be encoded in one or more external SEI units in the encoded bit stream [00410] In step 1010, process 1000 includes determining, based on the encoded bit stream not including one or more blocks of additional metadata, that the encoded bit stream does not conform to the requirement. A conformation bit stream is one that adheres to agreed standards. A non-conformed bit stream may not be capable of parsing and / or reproducible by decoders that are compatible with standards.
[00411] In step 1012, process 100 does not include processing at least a part of the system the encoded bit stream based on the determination that the encoded bit stream does not conform to the requirements. Failure to process a portion of the bit stream may mean, for example, that metadata blocks that include color volume transform parameters (for example, external SEI units that contain the parameters) are not analyzed, decoded and / or otherwise used. Alternatively or additionally, not the bitstream processing part can mean, for example,
Petition 870190048209, of 05/23/2019, p. 147/202
130/156 example, not processing (for example, decoding and / or displaying) video data associated with color volume transformation parameters. Alternatively or additionally, not processing a portion of the bit stream may mean not decoding or displaying the entire bit stream [00412] In some implementations, a coded set of color volume transformation parameters includes a transfer characteristic. In these implementations, process 1000 further includes determining that a transfer characteristic value is a particular value, such as a value indicating that the transfer function ST 2084 is included in the bit stream or a value indicating that the transfer function HPG is included in the bit stream. In these implementations, the encoded bit stream does not conform in the form of one or more blocks of metadata are included in the encoded bit stream and the value of the transfer characteristic is a particular value.
[00413] In some cases, the coded set of color volume transform parameters and the set of color volume parameters of the master display include the same field. In such cases, the determination that the encoded bit stream is non-conforming is also based on the field being present in both, one or more metadata blocks and one or more additional metadata blocks.
[00414] In some cases, the coded set of color volume transform parameters and the set of color volume parameters of the master display include the same field, and the field is omitted from one or more blocks of
Petition 870190048209, of 05/23/2019, p. 148/202
131/156 metadata. In such cases, when decoding the color volume parameter set, decoding uses a value for the field from the master display color coded volume volume parameter set.
[00415] In some cases, the video data being processed includes a plurality of processing windows. In such cases, the determination that the encoded bit stream is non-conforming may additionally be based on an amount of the plurality of processing windows being greater than sixteen.
[00416] In some cases, video data includes a plurality of content description elements. In such cases, the determination that the encoded bit stream is non-conforming may additionally be based on an amount of the plurality of content description elements being greater than one.
[00417] In some cases, the video data includes a plurality of target display elements. In these cases, the determination that the encoded bit stream is non-conforming can be additionally based on an amount of the plurality of target display elements being greater than sixteen.
[00418] In some implementations, process 1000 may also include determining that the encoded bit stream does not include a metadata block for a particular access unit in the encoded bit stream. In these implementations, the determination that the encoded bit stream does not conform can also be based on the encoded bit stream by not including a metadata block for the specific access unit.
Petition 870190048209, of 05/23/2019, p. 149/202
132/156 [00419] In some implementations, process 1000 may also include determining that the encoded bit stream includes a reserved value. In these implementations, the determination that the encoded bit stream is not yet conformed based on the encoded bit stream that includes a reserved value.
[00420] In some implementations, one or more blocks of metadata each include a length value. In such cases, the determination that the encoded bit stream is not yet formed based on the length value is not a multiple of eight. In some cases, the determination that the encoded bit stream is not conformed yet based on the length value being greater than 255.
[00421] The methods and operations discussed here can be implemented using compressed video, and can be implemented in an exemplary video encoding and decoding system (for example, system 100). In some examples, a system includes a source device that provides encoded video data to be decoded at a later time by a destination device. In particular, the source device provides the video data to the destination device via a computer-readable medium. The source device and the target device can comprise any of a wide range of devices, including desktop computers, notebook computers (ie laptop) computers, desktop computers, desktop boxes, telephone devices such as so-called smart phones , so-called smart blocks, televisions, cameras,
Petition 870190048209, of 05/23/2019, p. 150/202
133/156 display, digital media players, video game consoles, video streaming device, or the like. In some cases, the source device and the destination device may be equipped for wireless communication.
[00422] The destination device can receive the encoded video data to be decoded through the computer-readable medium. The computer-readable medium can comprise any type of medium or device capable of moving the encoded video data from the source device to the destination device. In one example, the computer-readable medium may comprise a computer-readable medium to allow the source device to transmit encoded video data directly to the destination device in real time. The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the destination device. The communication means can comprise any wireless or wired communication means, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium can form part of a packet-based network, such as a local area network, wide area network, or global network such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment that may be useful to facilitate communication from the source device to the destination device.
[00423] In some examples, encoded data can be output from the interface of
Petition 870190048209, of 05/23/2019, p. 151/202
134/156 output to a storage device. Similarly, encrypted data can be accessed from the storage device via the input interface. The storage device may include any of a variety of distributed or distributed data storage media accessed locally, such as a hard disk, Blu-ray discs, DVDs, CD-ROMs, Instant memory, volatile or non-volatile memory, or any other digital storage medium suitable for storing encoded video data. In a further example, the storage device may correspond to a file server or another intermediate storage device that can store the encoded video generated by the source device. The target device can access stored video data from the storage device via streaming or transfer. The file server can be any type of server capable of storing encoded video data and transmitting the encoded video data to the destination device. Exemplary file servers include a web server (for example, for a network site), an FTP server, network-attached storage devices (NAS), or a local disk drive. Target device can access encoded video data through any standard data connection, including an Internet connection. This can include a wireless channel (for example, a Wi-Fi connection), a wired connection (for example, DSL, cable modem, etc.) or a combination of both that is suitable for accessing encoded video data stored on a server
Petition 870190048209, of 05/23/2019, p. 152/202
135/156 files. The transmission of encoded video data from the storage device can be a continuous transmission, a transfer transmission, or a combination thereof.
[00424] The techniques in this description are not necessarily limited to wireless applications or settings. The techniques can be applied to video encoding in support of any of a variety of multimedia applications, such as television broadcasts over the air, cable television broadcasts, satellite television broadcasts, Internet streaming video broadcasts , such as dynamic adaptive streaming over HTTP (DASH) digital video that is encoded on a data storage medium, decoding digital video stored on a data storage medium, or other applications. In some examples, the system can be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and / or video telephony.
[00425] In one example, the source device includes a video source, a video encoder and an output interface. The target device may include an input interface, a video decoder and a display device. The video encoder of the source device can be configured to apply the techniques described here. In other examples, a source device and a target device may include other components or arrangements. For example, the source device can receive video data from a source
Petition 870190048209, of 05/23/2019, p. 153/202
136/156 external video, such as an external camera. Likewise, the target device can interface with an external display device, instead of including an integrated display device.
[00426] The exemplary system above is merely an example. Techniques for processing video data in parallel can be performed by any digital video encoding and / or decoding device. Although the techniques in this description are performed by a video encoding device, the techniques can also be performed by a video encoder / decoder, typically referred to as A CODEC. In addition, the techniques of this description can also be performed by a video preprocessor. Source device and target device are merely examples of such encoding devices in which the source device generates encoded video data for transmission to the target device. In some examples, the source and destination devices may operate in a substantially symmetrical manner so that each of the devices includes video encoding and decoding components. Therefore, sample systems can support one-way or two-way video transmission between video devices, for example, for video streaming, video playback, video broadcasting, or video telephony.
[00427] The video source may include a video capture device, such as a video camera, a video file containing previously captured video and / or a video input interface for
Petition 870190048209, of 05/23/2019, p. 154/202
137/156 receive video from a video content provider. As an additional alternative, the video source can generate data based on computer graphics such as the source video, or a combination of live video, archived video and computer generated video. In some cases, if the video source is a video camera, the source device and destination device can form so-called camera phones or video phones. As mentioned above, however, the techniques described in this description may be applicable to video encoding in general, and can be applied to wireless and / or wired applications. In each case, the captured, pre-captured or computer generated video can be encoded by the video encoder. The encoded video information can then be output via the
exit to the middle readable per computer.[00428] How noticed, the middle readable by computer can include transient media, such as a
wireless transmission or a wired network transmission, or storage media (i.e., non-transitory storage media) such as a hard disk, compact disk, compact disk, digital video disk, Blu-ray disk, or other readable medium per computer. In some examples, a network server (not shown) can receive encoded video data from the source device and provide the encoded video data to the destination device, for example, transmission over the network. Similarly, a computing device in a medium production facility, such as a disk stamping facility, can receive encoded video data from the source device and produce a
Petition 870190048209, of 05/23/2019, p. 155/202
138/156 disk containing the encoded video data. Therefore, the computer-readable medium can be understood as including one or more computer-readable media in various ways, in several examples.
[00429] The target device's input interface receives information from the computer-readable medium. Computer readable media information may include syntax information defined by the video encoder, which is also used by the video decoder, which includes syntax elements that describe characteristics and / or processing of blocks and other encoded units, for example, group of images (GOP). A display device displays the decoded video data for a user, and can comprise any of a variety of display devices, such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma, an organic light-emitting diode (OLED) display, or other type of display device. Various embodiments of the invention have been described.
[00430] The specific details of the encoding device 104 and the decoding device 112 are shown in Figure 11 and Figure 12, respectively. Figure 11 is a block diagram illustrating an example of coding device 104 that can implement one or more of the techniques described in this report. The encoding device 104 can, for example, generate the syntax structures described here (for example, the syntax structures of a VPS, SPS, PPS, or other syntax elements). Encoding device 104 can perform intra-prediction coding and interpretation of
Petition 870190048209, of 05/23/2019, p. 156/202
139/156 video blocks in video slices. As previously described, intra-coding is based, at least in part, on spatial prediction to reduce or remove spatial redundancy within a given video or image frame. Intercoding is based, at least in part, on time prediction to reduce or remove time redundancy within adjacent or surrounding frames of a video sequence. Intra-mode (mode I) can refer to any of several spatial based compression modes. Intermodes, such as uni-directional predictions (P mode) or biprediction (B mode), can refer to any of the various time-based compression modes.
[00431] The coding device 104 includes a partition unit 35, prediction processing unit 41, filter unit 63, image memory 64, adder 50, transform processing unit 52, quantization unit 54, and measurement unit entropy coding 56. Prediction processing unit 41 includes motion estimation unit 42, motion compensation unit 44, and intra-prediction processing unit 46. For video block reconstruction, encoding device 104 also includes unit reverse quantization 58, reverse transform processing unit 60, and adder 62. Filter unit 63 is intended to represent one or more loop filters, such as an unlock filter, adaptive loop filter (ALF) and a filter adaptive sample compensation (SAO). Although filter unit 63 is shown in Figure 1 as a loop filter, in other configurations, filter unit 63 can be implemented
Petition 870190048209, of 05/23/2019, p. 157/202
140/156 as a post-loop filter. A post-processing device 57 can perform additional processing on encoded video data generated by the encoding device 104. The techniques of this description can, in some cases, be implemented by the encoding device 104. In other cases, however, one or more of the techniques of this exposure can be implemented by the post-processing device 57 [00432] As shown in Figure 1, the encoding device 104 receives video data, and the partition unit 35 divides the data into video blocks. The partition can also include partitioning into slices, slice segments, tiles or other larger units, as well as video block partition, for example, according to a quadrangular structure of LCUs and CUs. Encoding device 104 generally illustrates components that encode video blocks within a video slice to be encoded. The slice can be divided into multiple video blocks (and possibly sets of video blocks referred to as tiles). The prediction processing unit 41 can select one of a plurality of possible encoding modes, such as one of a plurality of intra-prediction encoding modes or one of a plurality of interpredition encoding modes, for the current video block with based on error results (for example, encoding rate and level of distortion, or similar). The prediction processing unit 41 can supply the resulting intra- or intercodified block to adder 50 to generate residual block data and to adder 62 to
Petition 870190048209, of 05/23/2019, p. 158/202
141/156 reconstruct the coded block for use as a reference image.
[00433] The intraprediction processing unit 46 within the prediction processing unit 41 can perform intra-prediction encoding of the current video block with respect to one or more neighboring blocks in the same frame or slice as the current block to be encoded for the provision of spatial compression. The motion estimation unit 42 and the motion compensation unit 44 within the prediction processing unit 41 perform interpretive encoding of the current video block in relation to one or more predictive blocks in one or more reference images for the provision of a time compression [00434] Motion estimation unit 42 can be configured to determine the interpretation mode for a video slice according to a predetermined pattern for a video sequence. The predetermined pattern can designate video slices in the sequence as P slices, B slices or GPB slices. The motion estimation unit 42 and the motion compensation unit 44 can be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by the motion estimation unit 42, is the process of generating motion vectors, which estimate motion for video blocks. A motion vector, for example, can indicate the displacement of a prediction unit (PU) from a video block within a current video frame or image in relation to a predictive block within a video image.
Petition 870190048209, of 05/23/2019, p. 159/202
142/156 reference.
[00435] A predictive block is a block that is found to accurately match the PU of the video block to be encoded in terms of pixel difference, which can be determined by the sum of the absolute difference (SAD), sum of square difference ( SSD), or other difference metrics. In some examples, encoding device 104 can calculate values for entire pixel positions of reference images stored in image memory 64. For example, encoding device 104 can interpolate values for quarter pixel positions, pixel positions an eighth, or other fractional pixel positions of the reference image. Therefore, the motion estimation unit 42 can perform a motion search in relation to complete pixel positions and fractional pixel positions and output a motion vector with fractional pixel precision.
[00436] Motion estimation unit 42 calculates a motion vector for a PU of a video block in an intercodified slice by comparing the position of the PU in the position of a predictive block of a reference image. The reference image can be selected from a first reference image list (list 0) or a second reference image list (list), each of which identifies one or more reference images stored in the image memory 64 Motion estimation unit 42 sends the calculated motion vector to the entropy coding unit 56 and motion compensation unit 44.
[00437] Movement compensation, performed
Petition 870190048209, of 05/23/2019, p. 160/202
143/156 by the motion compensation unit 44, may involve the search or generation of the predictive block based on the motion vector determined by motion estimate, possibly by performing interpellations for sub-pixel precision. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 can locate the predictive block in which the motion vector points in a reference image list. The encoding device 104 forms a residual video block by subtracting the pixel values from the predictive video block from the pixel values of the current video block being encoded, forming pixel difference values. The pixel difference values form residual data for the block, and can include luma and chroma difference components. The adder 50 represents the component or components that perform this subtraction operation. The motion compensation unit 44 can also generate elements of syntax associated with the video blocks and the video slice for use by the decoding device 112 in decoding the video blocks of the video slice.
[00438] Intra-prediction processing unit 46 can intra-predict a current block, as an alternative to the interpretation performed by the movement estimation unit 42 and movement compensation unit 44, as described above. In particular, the intra-prediction processing unit 46 can determine an intra-prediction mode to use to encode a current block. In some examples, the intrapredictive processing unit 46 can encode a current block using several
Petition 870190048209, of 05/23/2019, p. 161/202
144/156 intra-prediction modes, for example, during separate coding steps, and intra-prediction unit processing 46 can select appropriate intra-prediction mode for use from the tested modes. For example, the intra-prediction processing unit 46 can calculate rate distortion values using rate distortion analysis for the various intra prediction modes tested, and can select the intra prediction mode having the best data distortion characteristics. rate between tested modes. Rate distortion analysis generally determines an amount of distortion (or error) between an encoded block and an original, uncoded block that has been encoded to produce the encoded block, as well as a bit rate (that is, a number of bits ) used to produce the coded block. Intra-prediction processing unit 46 can calculate ratios from distortions and rates for various coded blocks to determine which intra-prognostic mode displays the best rate distortion value for the block.
[00439] In any case, after selecting an intra-prediction mode for a block, the intra-prediction processing unit 46 can provide information indicative of the intra-prediction mode selected for the block for the entropy coding unit 56. The unit entropy coding system 56 can encode information indicating the selected intra-prediction mode. The encoding device 104 may include in the transmitted bit configuration bit data definitions of encoding contexts for several blocks as well as indications of a most likely intra-prediction mode, a table of contents index.
Petition 870190048209, of 05/23/2019, p. 162/202
145/156 intra-prediction mode, and a modified intrapredition index table for use for each of the contexts. The bitstream configuration data can include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables).
[00440] The prediction processing unit 41 generates the predictive block for the current video block through interpredition or intra-prediction, the encoding device 104 forms a residual video block by subtracting the predictive block from the current video block. Residual video data in the residual block can be included in one or more TUs and applied to transformation processing unit 52. Transform processing unit 52 transforms residual video data into residual transform coefficients using a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform. The transform processing unit 52 can convert the residual video data from a pixel domain to a transform domain, such as
as a domain of frequency. [00441] Unity of processing in transformation 52 can send the coefficients in
resulting transformations for the quantization unit 54. The quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process can reduce the bit depth associated with some or all of the coefficients. The degree of
Petition 870190048209, of 05/23/2019, p. 163/202
146/156 quantification can be modified by adjusting a quantization parameter. In some examples, the quantization unit 54 can then scan the matrix that includes the quantized transform coefficients. Alternatively, the entropy coding unit 56 can perform the scan.
[00442] After quantification, the entropy of the entropy coding unit 56 encodes the quantized transform coefficients. For example, the entropy coding unit 56 can perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax based adaptive context arithmetic coding (SBAC), probability interval partition entropy (PIPE) or other entropy coding technique. Following the entropy coding by the entropy coding unit 56, the encoded bit stream can be transmitted to the decoding device 112, or archived for later transmission or retrieval by the decoding device 1 122, the entropy coding unit 56 can also be transmitted. entropy encode motion vectors and other syntax elements for the current video slice being encoded.
[00443] Inverse quantization unit 58 and inverse transform processing unit 60 apply reverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block of a reference image. The unit of
Petition 870190048209, of 05/23/2019, p. 164/202
147/156 motion compensation 44 can calculate a reference block by adding the residual block to a predictive block of one of the reference images within a reference image list. The motion compensation unit 44 can also apply one or more interpellation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. The adder 62 adds the reconstructed residual block to the motion compensated prediction block produced by the motion compensation unit 44 to produce a reference block for storage in image memory 64. The reference block can be used per motion estimate unit. 42 and motion compensation unit 44 as a reference block for interpreting a block in a subsequent frame or video image.
[00444] This way, the device in coding 104 of the Figure 11 represents an example of one encoder video set up for generate syntax for one bit stream of video encoded. 0 device in coding 104 can, for example, generate sets in parameters VPS, SPS and PPS as described above. 0 device coding 104 may perform any an
of the techniques described herein, including the processes described above with respect to Figures 7 and 8. The techniques of this description have generally been described with respect to coding device 104, but as mentioned above, some of the techniques of this description can also be implemented by the post-processing 57 [00445] Figure 12 is a block diagram that
Petition 870190048209, of 05/23/2019, p. 165/202
148/156 illustrates an example of decoding device 112. The decoding device 112 includes an entropy decoding unit 80, prediction processing unit 81, reverse quantization unit 86, reverse transform processing unit 88, adder 90, filter unit 91, and image memory 92. Prediction processing unit 81 includes motion compensation unit 82 and intra prediction processing unit 84. Decoding device 112 may, in some instances, perform a pass of decoding generally reciprocal to the encoding pass described with respect to the encoding device 104 of Figure 16.
[00446] During the decoding process, the decoding device 112 receives an encoded video bit stream that represents video blocks from an encoded video slice and associated syntax elements sent by the encoding device 104. In some embodiments, the decoding device 112 can receive the encoded video bit stream from the encoding device 104. In some embodiments, the decoding device 112 can receive the encoded video bit stream from a network entity 79, such as such as a server, a media-aware network element (MANE), video editor / organizer, or other device configured to implement one or more of the techniques described above. Network entity 79 may or may not include encoding device 104. Some of the techniques described in this report may be implemented by network entity 7 9 before network entity 7 9 transmits the encoded video bit stream to the
Petition 870190048209, of 05/23/2019, p. 166/202
149/156 decoding device 1 122 in Some video decoding systems, network entity 79 and decoding device 112 may be parts of separate devices, while in other cases, the functionality described with respect to network entity 79 may be performed by the same device comprising the decoding device 112.
[00447] The entropy decoding unit 80 of the decoding device 112 entropy decodes the bit stream to generate quantized coefficients, motion vectors, and other syntax elements. The entropy decoding unit 80 forwards the motion vectors and other syntax elements to the prediction processing unit 81. The decoding device 112 can receive the syntax elements at the video slice level and / or at the block level. of video. The entropy decoding unit 80 can process and analyze both fixed-length syntax elements and variable-length syntax elements in one or more sets of parameters, such as A VPS, SPS, and PPS.
[00448] When the video slice is encoded as an intra-encoded image (I) the intra-prediction prediction processing unit 84 of the prediction processing unit 81 can generate prediction data for a video block of the INT slice of current video based on a signaled intra-prediction mode and data from previously decoded blocks of the current frame or image. When the video frame is encoded as an intercoded interlacing (ie, B, P or GPB) the
Petition 870190048209, of 05/23/2019, p. 167/202
150/156 motion compensation unit 82 of prediction processing unit 81 produces predictive blocks for a video block of the current video slice based on LT in the motion vectors and other syntax elements received from the entropy decoding unit 80. Predictive blocks can be produced from one of the reference images within a reference image list. The decoding device 112 can construct the reference frame lists, List 0 and List 1, using standard construction techniques based on reference images stored in image memory 92.
[00449] The motion compensation unit 82 determines the prediction information for a video block of the current video slice by analyzing the motion vectors and other syntax elements, and uses the prediction information to produce the predictive blocks for the current video block being decoded. For example, motion compensation unit 82 can use one or more elements of syntax in a set of parameters to determine a prediction mode (for example, Intra- or Interpretation) used to encode the video blocks of the video slice, a type of interpretition slice (e.g., B slice, P slice or GPB slice) construction information for one or more reference image lists for the slice, motion vectors for each slice's intercodified video block, interpretation status for each slice's intercodified video block, and other information to decode the video blocks in the current video slice.
Petition 870190048209, of 05/23/2019, p. 168/202
151/156 [00450] The motion compensation unit 82 can also perform interpellation based on interpellation filters. The motion compensation unit 82 can use interpellation filters as used by the encoding device 104 during the encoding of the video blocks to calculate interpellated values for sub-integer pixels of reference blocks. In this case, the motion compensation unit 82 can determine the interpellation filters used by the coding device 104 from the received syntax elements, and can use the interpellation filters to produce predictive blocks.
[00451] Reverse quantization unit 86 quantifies inverse or decantifies, the quantized transform coefficients provided in the bit stream and decoded by the entropy decoding unit 802. The reverse quantization process may include the use of a quantization parameter calculated by the device code 104 for each video block in the video slice to determine a degree of quantification and, likewise, a degree of reverse quantification that must be applied. The reverse transform processing unit 88 applies an inverse transform (for example, a reverse DCT or other suitable reverse transform), an inverse integer transform, or a conceptually similar reverse transform process, to the transform coefficients in order to produce residual blocks in the pixel domain.
[00452] Motion compensation unit 82 generates the predictive block for the current video block with
Petition 870190048209, of 05/23/2019, p. 169/202
152/156 Based on motion vectors and other syntax elements, the decoding device 112 forms a video block decoded by the sum of the residual blocks of the reverse transformation processing unit 88 with the corresponding predictive blocks generated by the
compensation in movement 82. 0 Adder 90 represents O component or components what perform it is operation in sum. If desired, filters loop (or in the loop in coding or after the loop in coding) can also be used for smooth transitions from pixel, or to improve from out how to quality of video of the. The unity in
filter 91 is intended to represent one or more loop filters, such as an unblocking filter, an adaptive circuit filter (ALF) and an adaptive sample compensation filter (SAO). Although filter unit 91 is shown in Figure 12 as a loop filter, in other configurations, filter unit 91 can be implemented as a post-loop filter. The video blocks decoded in a given frame or image are then stored in image memory 92, which stores reference images used for subsequent motion compensation. Image memory 92 also stores decoded video for later display on a display device, such as the video target device 122 shown in Figure 1.
[00453] In the preceding description, aspects of the application are described with reference to their specific modalities, but those skilled in the art will recognize that the invention is not limited to it. Thus, although illustrative modalities of application have been described
Petition 870190048209, of 05/23/2019, p. 170/202
153/156 in detail here, it is to be understood that inventive concepts may be otherwise variedly incorporated and employed, and that the appended claims are intended to be interpreted to include such variations, except as limited by the prior art. Various features and aspects of the invention described above can be used individually or together. In addition, the modalities can be used in any number of environments and applications other than those described here without departing from the broader spirit and scope of the specification. The specification and drawings, therefore, should be considered as illustrative, rather than restrictive. For purposes of illustration, the methods have been described in a particular order. It should be appreciated that in alternative configurations, the methods can be performed in a different order than described.
[00454] Where the components are described as being configured to perform certain operations, such configuration can be performed, for example, by design of electronic circuits or other hardware to perform the operation, by programming programmable electronic circuits (for example, microprocessors, or other suitable electronic circuits) to perform the operation, or any combination thereof.
[00455] The various illustrative logic blocks, modules, circuits, and steps of algorithms described in relation to the modalities described here can be implemented as electronic hardware, computer software, firmware, or combinations thereof. To clearly illustrate this interchangeability of hardware and
Petition 870190048209, of 05/23/2019, p. 171/202
154/156 software, various illustrative components, blocks, modules, circuits and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends on the specific application and the design restrictions imposed on the global system. Those skilled in the art can implement the functionality described in different ways for each specific application, but such implementation decisions should not be interpreted as departing from the scope of the present invention.
[00456] The techniques described here can also be implemented in electronic hardware, computer software, firmware, or any combination thereof. Such techniques can be implemented on any of a variety of devices, such as general purpose computers, wireless communication device devices, or multi-purpose integrated circuit devices, including application on wireless communication devices, devices and other devices. . Any features described as modules or components can be implemented in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques can be performed at least in part by means of computer-readable data storage comprising program code that includes instructions that, when executed, perform one or more of the methods described above. Computer-readable data storage medium may form part of a computer program product, which may include
Petition 870190048209, of 05/23/2019, p. 172/202
155/156 packaging. 0 computer-readable medium or data storage, such as random access memory (RAM) such as synchronous dynamic random access memory (SDRAM), read-only memory (ROM), non-volatile random access memory (NVRAM) , electrically erasable programmable read-only memory (EEPROM), FLASH memory, magnetic or optical data storage media, and the like. The techniques additionally, or alternatively, can be performed at least in part through a computer-readable communication medium that carries or communicates program code in the form of instructions or data structures and that can be accessed, read and / or executed by a computer, such as signals or propagated waves.
[00457] The program code may be executed by a processor, which may include one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application-specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Such a processor can be configured to perform any of the techniques described in this report. A general purpose processor can be a microprocessor; but in the alternative, the processor can be any conventional processor, controller, microcontroller or state machine. A processor can also be implemented as a combination of computing devices, for example, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in
Petition 870190048209, of 05/23/2019, p. 173/202
156/156 set with a DSP core, or any other such configuration. Accordingly, the term processor, as used herein, can refer to any of the preceding structure, any combination of the preceding structure, or any other structure or apparatus suitable for implementing the techniques described herein. In addition, in some respects, the functionality described here can be provided within dedicated software modules or hardware modules configured for encoding and decoding, or incorporated into a combined video encoder-decoder (CODEC).

权利要求:
Claims (28)
[1]
1. Method for processing video data, comprising:
receiving the video data, where the video data includes at least two video signals;
obtaining one or more sets of color volume transformation parameters from the video data;
determining a display region for each of the at least two video signals, where the display regions determine a portion of a video frame in which the video signals will be displayed;
determine, for each of the at least two video signals, a respective association between a video signal between the at least two video signals and a set of color volume transformation parameters between one or more sets of color transformation parameters color volume, where sets of color volume transformation parameters determine one or more display parameters for display regions for video signals;
generate one or more metadata blocks for one or more sets in volume transform parameters in color; generate one encoded bit stream for Dice
video, where the encoded bit stream includes one or more blocks of metadata; and encoding, in the encoded bit stream, the respective associations determined between the at least two video signals and the one or more sets of color volume parameters.
Petition 870190048209, of 05/23/2019, p. 175/202
[2]
12/2
A method according to claim 1, wherein the encoding of the respective associations determined between the at least two video signals and one or more sets of color volume parameters includes placing one or more blocks of metadata in the stream bits encoded according to an order of the display regions within the video frame.
[3]
A method according to claim 1, wherein the encoding of the respective associations determined between the at least two video signals and the one or more sets of color volume parameters includes the insertion of one or more values in the flow of encoded bits that each indicates the respective determined associations.
[4]
A method according to claim 1, wherein a first display region for a first video signal between at least two video signals overlaps a second display region for a second video signal between at least two video signals, and where a set of color volume transformation parameters between one or more sets of color volume transformation parameters for use in the overlap region is determined by a priority between the first display region and the second display region.
[5]
5. Method according to claim 4, wherein the priority is based on an order in which the first display region and the second display region are displayed on the video frame.
[6]
6. Method according to claim 4, wherein the priority is based on a value provided by the video data.
Petition 870190048209, of 05/23/2019, p. 176/202
12/3
[7]
Method according to claim 1, wherein one or more blocks of metadata are encoded in one or more Network Abstraction Layer Units (NAL) of supplementary improvement information (SEI).
[8]
8. Apparatus for processing video data, comprising:
a memory configured to store video data, wherein the video data includes at least two video signals; and a processor configured to:
obtain one or more sets of color volume transformation parameters from the video data to determine a display region for each of the at least two video signals, where the display regions determine parts of a video frame in which two video signals will be displayed;
determine, for each of the at least two video signals, an association between a video signal between the at least two video signals and a set of color volume transformation parameters between one or more sets of color volume transformation parameters color, in which sets of color volume transformation parameters determine one or more display parameters for display regions for video signals;
generate one or more metadata blocks for one or more sets of color volume transform parameters;
generating an encoded bit stream for the video data, wherein the encoded bit stream includes one or more blocks of metadata; and
Petition 870190048209, of 05/23/2019, p. 177/202
4/12 encode, in the encoded bit stream, the respective associations determined between the at least two video signals and the one or more sets of color volume parameters.
[9]
Apparatus according to claim 8, wherein the encoding of the respective associations determined between at least two video signals and one or more sets of color volume parameters includes the placement of one or more blocks of metadata in the stream bits encoded according to an order of the display regions within the video frame.
[10]
Apparatus according to claim 8, wherein the encoding of the respective associations determined between the at least two video signals and the one or more sets of color volume parameters includes the insertion of one or more values in the flow of encoded bits that each indicates the respective determined associations.
[11]
Apparatus according to claim 8, wherein a first display region for a first video signal between at least two video signals overlaps a second display region for a second video signal between at least two video signals, and where a set of color volume transformation parameters between one or more sets of color volume transformation parameters for use in the overlap region is determined by a priority between the first display region and the second display region.
[12]
Apparatus according to claim 8, wherein one or more blocks of metadata are encoded in
Petition 870190048209, of 05/23/2019, p. 178/202
5/12 one or more Network Abstraction Layer Units (NAL) for supplementary improvement information (SEI).
[13]
13. Non-transient computer-readable medium that has stored instructions on it that, when executed by one or more processors, cause the one or more processors to:
receiving video data, where the video data includes at least two video signals;
obtaining one or more sets of color volume transformation parameters from the video data;
determining a display region for each of the at least two video signals, where the display regions determine parts of a video frame in which the video signals will be displayed;
determine, for each of the at least two video signals, an association between a video signal between at least two video signals and a set of color volume transformation parameters between one or more sets of color volume transformation parameters color, where sets of color volume transformation parameters determine one or more display parameters for display regions for video signals;
generate one or more metadata blocks for one or more sets of color volume transform parameters;
generating an encoded bit stream for video data, wherein the encoded bit stream includes one or more blocks of metadata; and encode, in the encoded bit stream, the respective associations determined between the at least
Petition 870190048209, of 05/23/2019, p. 179/202
6/12 two video signals and one or more sets of color volume parameters.
[14]
14. Apparatus for processing video data, comprising:
means for receiving the video data, wherein the video data includes at least two video signals;
means for obtaining one or more sets of color volume transformation parameters from the video data means for determining a display region for each of the at least two video signals, where the display regions determine parts of a video frame in which the video signals will be displayed;
means for determining, for each of the at least two video signals, an association between a video signal between the at least two video signals and a set of color volume transformation parameters between one or more sets of color transformation parameters color volume, where sets of color volume transformation parameters determine one or more display parameters for display regions for video signals;
means for generating one or more blocks of metadata for one or more sets of color volume transform parameters;
means for generating an encoded bit stream for video data, wherein the encoded bit stream includes one or more blocks of metadata; and means for encoding, in the encoded bit stream, the respective association between the
Petition 870190048209, of 05/23/2019, p. 180/202
7/12 at least two video signals and one or more sets of color volume parameters.
[15]
15. Method of processing video data, comprising:
receiving an encoded bit stream, wherein the encoded bit stream includes at least two encoded video signals and one or more bits plus metadata blocks that include one or more sets of color volume transform parameters;
determine a display region for each of the two video signals at least encoded;
determine, for each of the two at least encoded video signals, an association between a video signal between the at least two encoded video signals and a set of color volume transformation parameters between one or more sets of color transform parameters color volume; and decoding at least two encoded video signals using a respective associated set of color volume transform parameters, where the respective associated set of color volume transform parameters determine one or more display parameters for a corresponding display region. .
[16]
16. The method of claim 15, wherein the respective associations between the at least two encoded video signals and the one or more sets of color volume transformation parameters are based on an order of the display regions.
[17]
17. Method according to claim 15, in
Petition 870190048209, of 05/23/2019, p. 181/202
8/12 that the respective associations between the at least two encoded video signals and the one or more sets of color volume transformation parameters are based on one or more values included in the encoded bit stream.
[18]
18. The method of claim 15, wherein a first display region for a first video signal between at least the encoded video signals overlaps a second display region for a second video signal between at least two encoded video signals, and where a set of color volume transformation parameters between one or more sets of color volume transformation parameters for use in the overlap region is determined by a priority between the first display region and the second display region.
[19]
19. The method of claim 18, wherein the priority is based on an order in which the first display region and the second display region are displayed in a video frame.
[20]
20. Method according to claim 18, wherein the priority is based on a value provided by the video data.
[21]
21. The method of claim 18, wherein one or more blocks of metadata are encoded in one or more Network Abstraction Layer Units (NAL) of supplementary improvement information (SEI).
[22]
22. Apparatus for processing video data, comprising:
a memory configured to store data from
Petition 870190048209, of 05/23/2019, p. 182/202
9/12 video; and a processor configured to:
receiving an encoded bit stream, wherein the encoded bit stream includes at least two encoded video signals and one or more bits plus metadata blocks that include one or more sets of color volume transform parameters;
determine a display region for each of the two video signals at least encoded;
determine, for each of the at least two encoded video signals, an association between a video signal between the at least two encoded video signals and a set of color volume transformation parameters between the set of color volume transformation parameters color; and decoding the at least two encoded video signals using a respective associated set of color volume transform parameters, where the respective associated set of color volume transform parameters determine one or more display parameters for a display region. corresponding.
[23]
Apparatus according to claim 22, wherein the respective associations between the at least two encoded video signals and the one or more sets of color volume transformation parameters are based on an order of the display regions.
[24]
Apparatus according to claim 22, wherein the respective associations between the at least two encoded video signals and the one or more sets of
Petition 870190048209, of 05/23/2019, p. 183/202
10/12 color volume transformation parameters are based on one or more values included in the encoded bit stream.
[25]
Apparatus according to claim 22, wherein a first display region for a first video signal between at least two encoded video signals overlaps a second display region for a second video signal between at least two encoded video signals, and where a set of color volume transformation parameters between one or more sets of color volume transformation parameters for use in the overlap region is determined by a priority between the first display region and the second display region.
[26]
26. Apparatus according to claim 22, wherein the one or more metadata blocks work are encoded in one or more network abstraction layer (NAL) units of supplementary improvement information (SEI).
[27]
27. Non-transitory computer-readable medium that has stored instructions on it that, when executed by one or more processors, cause the one or more processors to:
receiving an encoded bit stream, wherein the encoded bit stream includes at least two encoded video signals and one or more bits plus metadata blocks that include one or more sets of color volume transform parameters;
determine a display region for each of the two video signals at least encoded;
determine, for each of the at least two
Petition 870190048209, of 05/23/2019, p. 184/202
11/12 encoded video signals, association between a video signal between at least two encoded video signals and a set of color volume transform parameters between the set of color volume transform parameters; and decoding at least two encoded video signals using a respective associated set of color volume transform parameters, wherein the respective associated set of color volume transform parameters determine one or more display parameters for a corresponding display region. .
[28]
28. Apparatus for processing video data, comprising:
means for receiving an encoded bit stream, wherein the encoded bit stream includes at least two encoded video signals and one or more metadata blocks that include one or more sets of color volume transform parameters;
means for determining a display region for each of the two video signals at least encoded;
means for determining, for each of the at least two encoded video signals, an association between a video signal between the at least two encoded video signals and a set of color volume transformation parameters between the set of color transformation parameters color volume; and means for decoding the at least two encoded video signals using a respective associated set of color volume transform parameters,
Petition 870190048209, of 05/23/2019, p. 185/202
12/12 where the respective associated set of color volume transformation parameters determine one or more display parameters for a corresponding display region.

类似技术:

公开号 | 公开日 | 专利标题

BR112019010515A2|2019-10-01|systems and methods for signaling and restricting a dynamic high range | video system with dynamic metadata

ES2866180T3|2021-10-19|Supplemental Enhancement Information | messages for wide color gamut, high dynamic range video coding

US11102495B2|2021-08-24|Methods and systems for generating and processing content color volume messages for video

KR20190039958A|2019-04-16|Color space adaptation by feedback channel

JP2018515018A|2018-06-07|Dynamic range adjustment for high dynamic range and wide color gamut video coding

KR20180054782A|2018-05-24|Fixed-point implementation of scoping of components in video coding

KR102158418B1|2020-09-22|Improved color remapping information supplement enhancement information message processing

KR102277879B1|2021-07-14|Signaling Mechanisms for Same Ranges and Different DRA Parameters for Video Coding

BR112020012101A2|2020-11-17|Quantization parameter control for video encoding with joint transform/pixel based quantization

US10368099B2|2019-07-30|Color remapping information SEI message signaling for display adaptation

BR112020020594A2|2021-01-12|TRANSFORM-BASED QUANTIZATION HARMONIZATION AND DYNAMIC RANGE ADJUSTMENT SCALE DERIVATION IN VIDEO CODING

KR20210045458A|2021-04-26|Inverse quantization apparatus and method

BR112021003741A2|2021-05-25|encoder, decoder and corresponding methods using a palette encoding

TWI755363B|2022-02-21|Supplemental enhancement information | messages for high dynamic range and wide color gamut video coding

WO2021212014A1|2021-10-21|Flexible chroma processing for dynamic range adjustment

BR112021003999A2|2021-05-25|relationship between partition constraint elements

同族专利:

公开号 | 公开日

CN109983772B|2022-02-22|

AU2017367694A1|2019-05-02|

US10812820B2|2020-10-20|

US20180152721A1|2018-05-31|

CN109964485B|2022-01-11|

TW201824867A|2018-07-01|

EP3549342A1|2019-10-09|

EP3549343A1|2019-10-09|

WO2018102604A1|2018-06-07|

JP2020501430A|2020-01-16|

KR20190089888A|2019-07-31|

WO2018102605A1|2018-06-07|

KR20190089889A|2019-07-31|

AU2017368247A1|2019-05-02|

JP2020501426A|2020-01-16|

CN109964485A|2019-07-02|

CN109983772A|2019-07-05|

TW201824866A|2018-07-01|

BR112019010468A2|2019-09-10|

US10979729B2|2021-04-13|

US20180152703A1|2018-05-31|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US6278435B1|1998-04-03|2001-08-21|Tektronix, Inc.|Compression and acquisition count optimization in a digital oscilloscope variable intensity rasterizer|

JP2001333348A|2000-05-24|2001-11-30|Minolta Co Ltd|Transmitter, receiver, broadcast system and broadcast method|

US7414753B2|2004-05-06|2008-08-19|Canon Kabushiki Kaisha|Color characterization using nonlinear regression|

WO2010105036A1|2009-03-13|2010-09-16|Dolby Laboratories Licensing Corporation|Layered compression of high dynamic range, visual dynamic range, and wide color gamut video|

WO2012122423A1|2011-03-10|2012-09-13|Dolby Laboratories Licensing Corporation|Pre-processing for bitdepth and color format scalable video coding|

WO2012147018A2|2011-04-28|2012-11-01|Koninklijke Philips Electronics N.V.|Apparatuses and methods for hdr image encoding and decoding|

US20150201199A1|2011-12-07|2015-07-16|Google Inc.|Systems and methods for facilitating video encoding for screen-sharing applications|

US10136152B2|2014-03-24|2018-11-20|Qualcomm Incorporated|Use of specific HEVC SEI messages for multi-layer video codecs|

US9942575B1|2014-11-13|2018-04-10|Google Llc|Assigning videos to single-stream and multi-stream decoders|

WO2016089093A1|2014-12-04|2016-06-09|엘지전자 주식회사|Broadcasting signal transmission and reception method and device|

TW201717627A|2015-07-28|2017-05-16|Ｖｉｄ衡器股份有限公司|High dynamic range video coding architectures with multiple operating modes|

CN112866754A|2015-09-07|2021-05-28|Lg 电子株式会社|Broadcast signal transmitting apparatus and method, and broadcast signal receiving apparatus and method|

AU2015227469A1|2015-09-17|2017-04-06|Canon Kabushiki Kaisha|Method, apparatus and system for displaying video data|

JP6132006B1|2015-12-02|2017-05-24|日本電気株式会社|Video encoding device, video system, video encoding method, and video encoding program|

WO2017116419A1|2015-12-29|2017-07-06|Thomson Licensing|Method and apparatus for metadata insertion pipeline for streaming media|

WO2017171391A1|2016-03-30|2017-10-05|엘지전자 주식회사|Method and apparatus for transmitting and receiving broadcast signals|

US10812820B2|2016-11-30|2020-10-20|Qualcomm Incorporated|Systems and methods for signaling and constraining a high dynamic range video system with dynamic metadata|US10880557B2|2015-06-05|2020-12-29|Fastvdo Llc|High dynamic range image/video coding|

US10225561B2|2015-10-08|2019-03-05|Mediatek Inc.|Method and apparatus for syntax signaling in image and video compression|

US10257394B2|2016-02-12|2019-04-09|Contrast, Inc.|Combined HDR/LDR video streaming|

US10264196B2|2016-02-12|2019-04-16|Contrast, Inc.|Systems and methods for HDR video capture with a mobile device|

CN108781290A|2016-03-07|2018-11-09|皇家飞利浦有限公司|HDR videos are coded and decoded|

US10834400B1|2016-08-19|2020-11-10|Fastvdo Llc|Enhancements of the AV1 video codec|

US10812820B2|2016-11-30|2020-10-20|Qualcomm Incorporated|Systems and methods for signaling and constraining a high dynamic rangevideo system with dynamic metadata|

US10200687B2|2017-06-02|2019-02-05|Apple Inc.|Sample adaptive offset for high dynamic rangevideo compression|

US10306307B2|2017-06-30|2019-05-28|Apple Inc.|Automatic configuration of video output settings for video source|

US10951888B2|2018-06-04|2021-03-16|Contrast, Inc.|Compressed high dynamic range video|

WO2021007742A1|2019-07-15|2021-01-21|上海极清慧视科技有限公司|Compression method for obtaining video file, decompression method, system, and storage medium|

CN110691194B|2019-09-19|2021-04-20|锐迪科微电子（上海）有限公司|Wide color gamut image determination method and device|

KR102302755B1|2019-12-30|2021-09-16|재단법인 경주스마트미디어센터|DRM contents parallel packaging device and system comprising it and method for DRM contents parallel packaging|

法律状态:
2021-10-05| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201662428511P| true| 2016-11-30|2016-11-30|

US15/826,549|US10812820B2|2016-11-30|2017-11-29|Systems and methods for signaling and constraining a high dynamic rangevideo system with dynamic metadata|

PCT/US2017/064058|WO2018102605A1|2016-11-30|2017-11-30|Systems and methods for signaling and constraining a high dynamic rangevideo system with dynamic metadata|

[返回顶部]